Nodemanager crashing repeatedly

2018-09-04 Thread Gajanan Watkar
I am running Nutch-2.3.1 over Hadoop-2.5.2 and Hbase-1.2.3 with
integration to Solr-6.5.1. I have crawled over 10 million pages. But
while doing all this I am continuously facing two problems:

1. My Nodemanager is crashing repeatedly during different phases of
crawl. It crashes my linux session and forces logout with nodemanager
killed. I log-in again, restart NodeManger and the same failed crawl
phase runs to success. [Nodemanager log has nothing to report]

2. I am running all my crawl phases one by one without crawl script, as
with crawl script most of the time my jobs were exiting with
"WaitForjobCompletion" error at different stages of crawl. So, I
decided to go ahead with one by one method which prevented
"WaitForjobCompletion" to occure.

Any help will be highly appreciated. New to mailing-list, New to Nutch.

-Gajanan


redirect bin/crwal log output to some other file

2018-09-04 Thread Amarnatha Reddy
Hi All,

We are using bin/crawl  command to crawl and index data into solr,
currently the output is writing into default logs/hadoop.log file, so my
requirement is how can i log data writing into different file


bin/crawl -i -D solr.server.url=http://localhost:8983/solr/jeepkr -s urls/
crawl/ 1  -->this will write log details under default path logs/hadoop.log

How can i write log path by passing as part of bin/crawl?

ex: bin/crawl -i -D solr.server.url=http://localhost:8983/solr/jeepkr -s
urls/ crawl/ 1  >/tmp/myurls.log
-- 

-

Thanks and Regards,

*Amarnath Polu*


IndexWriter interface in 1.15

2018-09-04 Thread Yossi Tamari
Hi,

 

I missed it at the time, but I just realized (the hard way) that the
IndexWriter interface was changed in 1.15 in ways that are not backward
compatible.

That means that any custom IndexWriter implementation will no longer
compile, and probably will not run either.

I think this was a mistake (maybe a new interface should have been created,
and the old one deprecated and supported for now, or just the old methods
deprecated without change, and the new methods provided with a default
implementation), but it's too late now. 

I still think this is something that should be highlighted in the release
note for 1.15 (meaning at the top, as "breaking changes").

The main changes I encountered:

1.  setConf and getConf were removed from the interface (without
deprecation).
2.  open was deprecated (that's fine), and its signature was changed
(from JobConf to Configuration), which means it a completely different
function technically, and there is no point in the deprecation.

 

Yossi.