Hello all

I've seen this mentioned in the mailing list before but nobody provided a solution yet (or I didn't find it).

The problem is that this "deflateBytes" seems to hang for long periods (from minutes to more than an hour) making the whole crawling process really slow. I'm crawling a single domain from inside, so I want the process to be as quick as possible, and now it is taking 10+ hours. During all this "hung" time there is no apparent CPU usage by the java process.

Any ideas on how to proceed with this? It is quite annoying, specially since HtDig takes less than two hours to index the same content.

Otherwise we are quite happy with Nutch and impressed with all the features.

Regards
Daniel

------------------------------------------------------------------------

Full thread dump Java HotSpot(TM) Client VM (1.5.0_07-b03 mixed mode, sharing):

"fetcher6" prio=1 tid=0x084c1348 nid=0x2ea5 runnable [0x469f6000..0x469f6580]
        at java.util.zip.Deflater.deflateBytes(Native Method)
        at java.util.zip.Deflater.deflate(Deflater.java:284)
        - locked <0x4a08c228> (a java.util.zip.Deflater)
at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:154) at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:114)
        at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:72)
        - locked <0x4a08c208> (a java.util.zip.GZIPOutputStream)
at org.apache.nutch.io.WritableUtils.writeCompressedByteArray(WritableUtils.java:53)
        at org.apache.nutch.protocol.Content.write(Content.java:81)
at org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137)
        at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127)
        - locked <0x4c85d838> (a org.apache.nutch.io.ArrayFile$Writer)
        at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
        - locked <0x4c85d838> (a org.apache.nutch.io.ArrayFile$Writer)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:278)
        - locked <0x4c85d800> (a org.apache.nutch.io.ArrayFile$Writer)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)

------------------------------------------------------------------------

[ All other fetchers are blocked on the "outputPage" method, waiting for the previous thread to free the lock ]

------------------------------------------------------------------------

"fetcher7" prio=1 tid=0x084d85d0 nid=0x2ea6 waiting for monitor entry [0x46b79000..0x46b79600] at org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277) - waiting to lock <0x4c85d800> (a org.apache.nutch.io.ArrayFile$Writer) at org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)



--

Daniel Varela Santoalla
European Centre for Medium-Range Weather Forecasts (ECMWF) (http://www.ecmwf.int)

Reply via email to