On 12/17/10 2:08 AM, brad wrote:

To Generate, I use the following:
nutch generate -all -topN 100000

crawl.GeneratorJob - GeneratorJob: Selecting best-scoring urls due for
fetch.
crawl.GeneratorJob - GeneratorJob: starting
crawl.GeneratorJob - GeneratorJob: filtering: true
crawl.GeneratorJob - GeneratorJob: topN: 100000
crawl.GeneratorJob - GeneratorJob: done
crawl.GeneratorJob - GeneratorJob: generated batch id: 1292541893-1499060629

No other log information is provided...
Unlike the old way which include log items like:
Generator: starting at 2010-10-08 23:16:02

If in doubt you should check the logs/hadoop.log - if there were any exceptions they should be reported there.

Same type of issue occurs with Fetch:
nutch fetch -all -threads 100 -parse

The log files show:
fetcher.FetcherJob - FetcherJob: starting
fetcher.FetcherJob - FetcherJob : timelimit set for : -1
fetcher.FetcherJob - FetcherJob: threads: 10
fetcher.FetcherJob - FetcherJob: parsing: false
fetcher.FetcherJob - FetcherJob: resuming: false
fetcher.FetcherJob - FetcherJob: fetching all
fetcher.FetcherJob - FetcherJob: done

Again, there should be some data in the log. Also, at this point you can re-run readdb and check if the statistics is changed.


So, the question is, is Nutch 2.0 ready to beta test? or am I doing
something very wrong?

I guess it could be a config error - basic usage should just work...


So what am I missing?

I don't know, we need more information. BTW, dev@ list may be more appropriate for this discussion.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to