Generator exits incorrectly for small fetchlists

:-)_ Mon, 26 Jul 2010 10:53:49 -0700

Hey!

I've been trying out Nutch 1.0 and face an intermittent issue exactly as
this one https://issues.apache.org/jira/browse/NUTCH-503


I mean I can crawl certain websites without any problems but for some I end
up with this error (for both v1.0 and v1.1):

hardy8:~/devel/sware/dump/apache-nutch-1.1-bin$ bin/nutch crawl urls -dir
crawl -depth 100 -topN 999999999 -threads 50
crawl started in: crawl
rootUrlDir = urls
threads = 50
depth = 100
indexer=lucene
topN = 999999999
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 999999999
Generator: jobtracker is 'local', generating exactly one partition.
*Generator: 0 records selected for fetching, exiting ...*
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawl

Debugging java/org/apache/nutch/crawl/Generator.java revealed that *
readers.length* was 1 (which is correct since I was crawling only one url),
but ----> *if (readers[num].next(new FloatWritable())) * condition below did
not evaluate correctly as it should have.

    // check that we selected at least some entries ...
    SequenceFile.Reader[] readers = SequenceFileOutputFormat.getReaders(job,
tempDir);
    boolean empty = true;
    if (readers != null && readers.length > 0) {
      for (int num = 0; num < readers.length; num++) {
*        if (readers[num].next(new FloatWritable())) {*
          empty = false;
          break;
        }
      }
    }

I'm kind of stuck and was wondering if others have faced this too?


Gaurav

PS: I'm running nutch out of the box on a single machine and am not using
HDFS.

Generator exits incorrectly for small fetchlists

Reply via email to