Your threads is larger than capacity of internet bandwidth => content == null or contentType == null
2007/7/31, Kai_testing Middleton <[EMAIL PROTECTED]>: > > Are hung threads natural? > > I ran a crawl: > nohup time nutch crawl /usr/tmp/urls.txt -dir /usr/tmp/86sites -threads > 200 -depth 10 -topN 103103 > > it ran a few hours after which I noticed that it seemed hung: > > fetching http://www.mediarights.org/film/the_rules_of_the_game.php > fetch of http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5failed > with: Http code=500, url= > http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5 > Aborting with 46 hung threads. > java.lang.NullPointerException > at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos( > FSDataInputStream.java:87) > at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java > :125) > at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java > :1736) > at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress( > SequenceFileRecordReader.java:108) > at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165) > at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116) > fetcher caught:java.lang.NullPointerException > java.lang.NullPointerException > at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos( > FSDataInputStream.java:87) > at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java > :125) > at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java > :1736) > at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress( > SequenceFileRecordReader.java:108) > at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165) > at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116) > fetcher caught:java.lang.NullPointerException > java.lang.NullPointerException > at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos( > FSDataInputStream.java:87) > > lather, rinse, repeat > . > . > . > one final: > java.lang.NullPointerException > > > then it didn't progress (though I didn't wait long). > > though hadoop.log seemed to keep going: > > 2007-07-30 15:21:05,106 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155) > 2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116) > 2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - fetcher caught: > java.lang.NullPointerException > 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - > java.lang.NullPointerException > 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at > org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos( > FSDataInputStream.java:87) > 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at > org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125) > 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at > org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java > :1736) > 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress( > SequenceFileRecordReader.java:108) > 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165) > 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at > org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155) > 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116) > 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - fetcher caught: > java.lang.NullPointerException > 2007-07-30 16:16:02,932 INFO fetcher.Fetcher - Fetcher: done > 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: starting > 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: db: > /usr/tmp/86sites/crawldb > 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: segments: > [/usr/tmp/86sites/segments/20070730124436] > 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: additions > allowed: true > 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: URL > normalizing: true > 2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: URL > filtering: true > 2007-07-30 16:16:02,993 INFO crawl.CrawlDb - CrawlDb update: Merging > segment data into db. > > > > > > > > > ____________________________________________________________________________________ > Park yourself in front of a world of choices in alternative vehicles. > Visit the Yahoo! Auto Green Center. > http://autos.yahoo.com/green_center/ -- ******************************************************** Le Quoc Anh Tel: 0912643289 http://quocanh263.googlepages.com/wedding 4/268 Le Trong Tan, Hanoi, Vietnam ********************************************************
