Generator exits incorrectly for small fetchlists

:-)_ Mon, 26 Jul 2010 21:59:08 -0700
>
> Hey!
>
> I've been trying out Nutch 1.0 and face an intermittent issue exactly as
> this one https://issues.apache.org/jira/browse/NUTCH-503
>
> I mean I can crawl certain websites without any problems but for some I end
> up with this error (for both v1.0 and v1.1):
>
> hardy8:~/devel/sware/dump/apache-nutch-1.1-bin$ bin/nutch crawl urls -dir
> crawl -depth 100 -topN 999999999 -threads 50
> crawl started in: crawl
> rootUrlDir = urls
> threads = 50
> depth = 100
> indexer=lucene
> topN = 999999999
> Injector: starting
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: done
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 999999999
> Generator: jobtracker is 'local', generating exactly one partition.
> *Generator: 0 records selected for fetching, exiting ...*
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
>
> Debugging java/org/apache/nutch/crawl/Generator.java revealed that *
> readers.length* was 1 (which is correct since I was crawling only one
> url),
> but ----> *if (readers[num].next(new FloatWritable())) * condition below
> did not evaluate correctly as it should have.
>
>     // check that we selected at least some entries ...
>     SequenceFile.Reader[] readers =
> SequenceFileOutputFormat.getReaders(job, tempDir);
>     boolean empty = true;
>     if (readers != null && readers.length > 0) {
>       for (int num = 0; num < readers.length; num++) {
> *        if (readers[num].next(new FloatWritable())) {*
>           empty = false;
>           break;
>         }
>       }
>     }
>
> I'm kind of stuck and was wondering if others have faced this too?
>
>
> Gaurav
>
> PS: I'm running nutch out of the box on a single machine and am not using
> HDFS.
>
Generator exits incorrectly for small fetchlists

Reply via email to