Are hung threads natural? I ran a crawl: nohup time nutch crawl /usr/tmp/urls.txt -dir /usr/tmp/86sites -threads 200 -depth 10 -topN 103103
it ran a few hours after which I noticed that it seemed hung: fetching http://www.mediarights.org/film/the_rules_of_the_game.php fetch of http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5 failed with: Http code=500, url=http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5 Aborting with 46 hung threads. java.lang.NullPointerException at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87) at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125) at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736) at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108) at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165) at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116) fetcher caught:java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87) at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125) at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736) at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108) at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165) at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116) fetcher caught:java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87) lather, rinse, repeat . . . one final: java.lang.NullPointerException then it didn't progress (though I didn't wait long). though hadoop.log seemed to keep going: 2007-07-30 15:21:05,106 FATAL fetcher.Fetcher - at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155) 2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116) 2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - java.lang.NullPointerException 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87) 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125) 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736) 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108) 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165) 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155) 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116) 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2007-07-30 16:16:02,932 INFO fetcher.Fetcher - Fetcher: done 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: starting 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: db: /usr/tmp/86sites/crawldb 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: segments: [/usr/tmp/86sites/segments/20070730124436] 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: additions allowed: true 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: URL normalizing: true 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: URL filtering: true 2007-07-30 16:16:02,993 INFO crawl.CrawlDb - CrawlDb update: Merging segment data into db. ____________________________________________________________________________________ Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. http://autos.yahoo.com/green_center/
