Dear all.

I'm crawling Internet (inject 100 URL's and start crawling with deep 10) with Nutch 1.2.

After some time I receive the following (on the deep 3):
-activeThreads=1500, spinWaiting=131, fetchQueues.totalSize=3431
Aborting with 1500 hung threads.

Fetching aborts every time when I try to crawl many URLs (e.g., 250 000 injected URLs doesn't hang, but 400 000 URLs hangs with any number of threads (from 10 to 1500))

After some time I start receiving messages like this:

fetch of http://twitter.com/ToddJG failed with: java.lang.NullPointerException
java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1108) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1025)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:263)
at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:243)
at org.apache.hadoop.io.Text.write(Text.java:281)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:892) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:899)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:768)
fetcher caught:java.lang.NullPointerException

...
-finishing thread FetcherThread, activeThreads=0


Can somebody help?

Thank You in advance,
Kind Regards,

--

Andrey Sapegin,
Software Developer,

Unister GmbH
Dittrichring 18-20 | 04109 Leipzig

+49 (0)341 492885069,
+4915778339304,
[email protected]

www.unister.de

Reply via email to