>
> Hey!
>
> I've been trying out Nutch 1.0 and face an intermittent issue exactly as
> this one https://issues.apache.org/jira/browse/NUTCH-503
>
> I mean I can crawl certain websites without any problems but for some I end
> up with this error (for both v1.0 and v1.1):
>
> hardy8:~/devel/sware/dump/apache-nutch-1.1-bin$ bin/nutch crawl urls -dir
> crawl -depth 100 -topN 999999999 -threads 50
> crawl started in: crawl
> rootUrlDir = urls
> threads = 50
> depth = 100
> indexer=lucene
> topN = 999999999
> Injector: starting
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: done
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 999999999
> Generator: jobtracker is 'local', generating exactly one partition.
> *Generator: 0 records selected for fetching, exiting ...*
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
>
> Debugging java/org/apache/nutch/crawl/Generator.java revealed that *
> readers.length* was 1 (which is correct since I was crawling only one
> url),
> but ----> *if (readers[num].next(new FloatWritable())) * condition below
> did not evaluate correctly as it should have.
>
> // check that we selected at least some entries ...
> SequenceFile.Reader[] readers =
> SequenceFileOutputFormat.getReaders(job, tempDir);
> boolean empty = true;
> if (readers != null && readers.length > 0) {
> for (int num = 0; num < readers.length; num++) {
> * if (readers[num].next(new FloatWritable())) {*
> empty = false;
> break;
> }
> }
> }
>
> I'm kind of stuck and was wondering if others have faced this too?
>
>
> Gaurav
>
> PS: I'm running nutch out of the box on a single machine and am not using
> HDFS.
>