Re: Nodemanager crashing repeatedly

2018-09-06 Thread lewis john mcgibbney
Hi Gajanan,
Which OS are you running this on?
I would also suggest that if you want to use the 2.x codebase, you should
use the most recent from SCM e.g. check out master and change to 2.x branch.
Finally, for now at least, you didn't mention the phase at which the crawl
is failing. Can you provide this?

On Thu, Sep 6, 2018 at 8:58 AM  wrote:

> From: Gajanan Watkar 
> To: user@nutch.apache.org
> Cc:
> Bcc:
> Date: Wed, 05 Sep 2018 11:27:21 +0530
> Subject: Nodemanager crashing repeatedly
> I am running Nutch-2.3.1 over Hadoop-2.5.2 and Hbase-1.2.3 with
> integration to Solr-6.5.1. I have crawled over 10 million pages. But
> while doing all this I am continuously facing two problems:
>
> 1. My Nodemanager is crashing repeatedly during different phases of
> crawl. It crashes my linux session and forces logout with nodemanager
> killed. I log-in again, restart NodeManger and the same failed crawl
> phase runs to success. [Nodemanager log has nothing to report]
>
> 2. I am running all my crawl phases one by one without crawl script, as
> with crawl script most of the time my jobs were exiting with
> "WaitForjobCompletion" error at different stages of crawl. So, I
> decided to go ahead with one by one method which prevented
> "WaitForjobCompletion" to occure.
>
> Any help will be highly appreciated. New to mailing-list, New to Nutch.
>
> -Gajanan
>
>


RE: IndexWriter interface in 1.15

2018-09-06 Thread Yossi Tamari
Hi Lewis,

First of all I must say that I can't reproduce my claim regarding 
getConf/setConf. I was getting a compilation error for their @Override, but not 
anymore, and it's being called, so I'm not sure what happened.
Open() changing its signature is still a breaking change. I can't roll a new 
release, because I'm not a maintainer. I'm also not sure if it's justified, 
because I don't know how many people implement an IndexWriter.
I would still suggest that it be added as a breaking change in master.

Yossi.

> -Original Message-
> From: lewis john mcgibbney 
> Sent: 06 September 2018 19:00
> To: user@nutch.apache.org
> Subject: Re: IndexWriter interface in 1.15
> 
> Hi Yossi,
> 
> REASON: Upgrade of MapReduce API from legacy to 'new'. This was a breaking
> change for sure and a HUGE patch. We did not however factor in the non-
> braking aspects of the upgrade... so it has not all been plain sailing.
> PROPOSED SOLUTION: I tend to agree with you that this should be addd as a
> breaking change to the current master CHANGES.txt and should be consulted
> when people pull a new release. We cannot add this to the release artifacts
> however. We would need to roll a new release (1.15.1). If you feel that this 
> is
> enough of a reason to roll a new release (which I do not) then please go ahead
> and do so.
> 
> This is a lesson learned and I can honestly say that it was the result of us 
> trying
> to make the upgrade as clean as possible without leaving too much of the
> deprecated MR API still around. Maybe this could have however been phased
> out across several releases...
> 
> Lewis
> 
> On Tue, Sep 4, 2018 at 8:53 AM  wrote:
> 
> >
> > user Digest 4 Sep 2018 15:53:01 - Issue 2929
> >
> > Topics (messages 34147 through 34147)
> >
> > IndexWriter interface in 1.15
> > 34147 by: Yossi Tamari
> >
> > Administrivia:
> >
> > -
> > To post to the list, e-mail: user@nutch.apache.org To unsubscribe,
> > e-mail: user-digest-unsubscr...@nutch.apache.org
> > For additional commands, e-mail: user-digest-h...@nutch.apache.org
> >
> > --
> >
> >
> >
> >
> > -- Forwarded message --
> > From: Yossi Tamari 
> > To: 
> > Cc:
> > Bcc:
> > Date: Tue, 4 Sep 2018 18:52:54 +0300
> > Subject: IndexWriter interface in 1.15 Hi,
> >
> >
> >
> > I missed it at the time, but I just realized (the hard way) that the
> > IndexWriter interface was changed in 1.15 in ways that are not backward
> > compatible.
> >
> > That means that any custom IndexWriter implementation will no longer
> > compile, and probably will not run either.
> >
> > I think this was a mistake (maybe a new interface should have been created,
> > and the old one deprecated and supported for now, or just the old methods
> > deprecated without change, and the new methods provided with a default
> > implementation), but it's too late now.
> >
> > I still think this is something that should be highlighted in the release
> > note for 1.15 (meaning at the top, as "breaking changes").
> >
> > The main changes I encountered:
> >
> > 1.  setConf and getConf were removed from the interface (without
> > deprecation).
> > 2.  open was deprecated (that's fine), and its signature was changed
> > (from JobConf to Configuration), which means it a completely different
> > function technically, and there is no point in the deprecation.
> >
> >
> >
> > Yossi.
> >
> >
> 
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc



Re: redirect bin/crwal log output to some other file

2018-09-06 Thread lewis john mcgibbney
Hi Amarnatha,
There are a couple of options which I can think of.
1. Why don't you just set up a simple daemon to watch hadoop.log and
generate a subsequent stream writing it to /tmp/myurls.log e.g. tail -f
hadoop.log > /tmp/myurls.log
2. Check out confirmation/log4j.properties, you will see the configuration
for Hadoop.log in there. 'Maybe' you can change this location, rebuild your
deployment and it will solve your issue.
I'm sure there ate several other ways as well.
hth
Lewis

On Thu, Sep 6, 2018 at 8:58 AM  wrote:

> From: Amarnatha Reddy 
> To: user@nutch.apache.org
> Cc:
> Bcc:
> Date: Tue, 4 Sep 2018 22:13:58 +0530
> Subject: redirect bin/crwal log output to some other file
> Hi All,
>
> We are using bin/crawl  command to crawl and index data into solr,
> currently the output is writing into default logs/hadoop.log file, so my
> requirement is how can i log data writing into different file
>
>
> bin/crawl -i -D solr.server.url=http://localhost:8983/solr/jeepkr -s urls/
> crawl/ 1  -->this will write log details under default path logs/hadoop.log
>
> How can i write log path by passing as part of bin/crawl?
>
> ex: bin/crawl -i -D solr.server.url=http://localhost:8983/solr/jeepkr -s
> urls/ crawl/ 1  >/tmp/myurls.log
> --
>
>


Re: IndexWriter interface in 1.15

2018-09-06 Thread lewis john mcgibbney
Hi Yossi,

REASON: Upgrade of MapReduce API from legacy to 'new'. This was a breaking
change for sure and a HUGE patch. We did not however factor in the
non-braking aspects of the upgrade... so it has not all been plain sailing.
PROPOSED SOLUTION: I tend to agree with you that this should be addd as a
breaking change to the current master CHANGES.txt and should be consulted
when people pull a new release. We cannot add this to the release artifacts
however. We would need to roll a new release (1.15.1). If you feel that
this is enough of a reason to roll a new release (which I do not) then
please go ahead and do so.

This is a lesson learned and I can honestly say that it was the result of
us trying to make the upgrade as clean as possible without leaving too much
of the deprecated MR API still around. Maybe this could have however been
phased out across several releases...

Lewis

On Tue, Sep 4, 2018 at 8:53 AM  wrote:

>
> user Digest 4 Sep 2018 15:53:01 - Issue 2929
>
> Topics (messages 34147 through 34147)
>
> IndexWriter interface in 1.15
> 34147 by: Yossi Tamari
>
> Administrivia:
>
> -
> To post to the list, e-mail: user@nutch.apache.org
> To unsubscribe, e-mail: user-digest-unsubscr...@nutch.apache.org
> For additional commands, e-mail: user-digest-h...@nutch.apache.org
>
> --
>
>
>
>
> -- Forwarded message --
> From: Yossi Tamari 
> To: 
> Cc:
> Bcc:
> Date: Tue, 4 Sep 2018 18:52:54 +0300
> Subject: IndexWriter interface in 1.15
> Hi,
>
>
>
> I missed it at the time, but I just realized (the hard way) that the
> IndexWriter interface was changed in 1.15 in ways that are not backward
> compatible.
>
> That means that any custom IndexWriter implementation will no longer
> compile, and probably will not run either.
>
> I think this was a mistake (maybe a new interface should have been created,
> and the old one deprecated and supported for now, or just the old methods
> deprecated without change, and the new methods provided with a default
> implementation), but it's too late now.
>
> I still think this is something that should be highlighted in the release
> note for 1.15 (meaning at the top, as "breaking changes").
>
> The main changes I encountered:
>
> 1.  setConf and getConf were removed from the interface (without
> deprecation).
> 2.  open was deprecated (that's fine), and its signature was changed
> (from JobConf to Configuration), which means it a completely different
> function technically, and there is no point in the deprecation.
>
>
>
> Yossi.
>
>

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc