Hello, I have a strange problem generating fetchlists, maybe someone can point in the right direction?
I do a couple of inject/generate/fetch/update-cycles to crawl a defined subgraph. last cycle approx. 600000 docs should be fetched, but only 150000 are actually fetched. The last thing I see in the log file is 2006-12-04 14:24:14,221 INFO fetcher.Fetcher - fetch of http://www.microbes.info/forums/index.php?s=7179874ada709ad4d9874517f2790ef0& failed with: java.lang.NullPointerEx$ 2006-12-04 14:24:14,238 FATAL fetcher.Fetcher - java.lang.NullPointerException 2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:198) 2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:189) 2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - at org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:91) 2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:314) 2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:232) 2006-12-04 14:24:14,241 FATAL fetcher.Fetcher - fetcher caught:java.lang.NullPointerException Then nothing happens; approx. 10 minutes later map reduce comes up with 2006-12-04 14:33:06,169 INFO mapred.LocalJobRunner - reduce > sort 2006-12-04 14:33:07,586 INFO mapred.JobClient - map 100% reduce 33%
