We have seen this before too.  If is the same problem it is the regex 
url filter.  Comment out the  -.*(/.+?)/.*?\1/.*?\1/ expression in the 
regex-urlfilter.txt file and it should resolve itself.  Also search the 
forum for "Fetcher stops pushes cpu to 100%".

Dennis

Daniel Varela Santoalla wrote:
>
> Hello all
>
> I've seen this mentioned in the mailing list before but nobody 
> provided a solution yet (or I didn't find it).
>
> The problem is that this "deflateBytes" seems to hang for long periods 
> (from minutes to more than an hour) making the whole crawling process 
> really slow. I'm crawling a single domain from inside, so I want the 
> process to be as quick as possible, and now it is taking 10+ hours. 
> During all this "hung" time there is no apparent CPU usage by the java 
> process.
>
> Any ideas on how to proceed with this? It is quite annoying, specially 
> since HtDig takes less than two hours to index the same content.
>
> Otherwise we are quite happy with Nutch and impressed with all the 
> features.
>
> Regards
> Daniel
>
> ------------------------------------------------------------------------
>
> Full thread dump Java HotSpot(TM) Client VM (1.5.0_07-b03 mixed mode, 
> sharing):
>
> "fetcher6" prio=1 tid=0x084c1348 nid=0x2ea5 runnable 
> [0x469f6000..0x469f6580]
>         at java.util.zip.Deflater.deflateBytes(Native Method)
>         at java.util.zip.Deflater.deflate(Deflater.java:284)
>         - locked <0x4a08c228> (a java.util.zip.Deflater)
>         at 
> java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:154)
>         at 
> java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:114)
>         at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:72)
>         - locked <0x4a08c208> (a java.util.zip.GZIPOutputStream)
>         at 
> org.apache.nutch.io.WritableUtils.writeCompressedByteArray(WritableUtils.java:53)
>  
>
>         at org.apache.nutch.protocol.Content.write(Content.java:81)
>         at 
> org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137)
>         at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127)
>         - locked <0x4c85d838> (a org.apache.nutch.io.ArrayFile$Writer)
>         at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
>         - locked <0x4c85d838> (a org.apache.nutch.io.ArrayFile$Writer)
>         at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:278) 
>
>         - locked <0x4c85d800> (a org.apache.nutch.io.ArrayFile$Writer)
>         at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 
>
>         at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
>
> ------------------------------------------------------------------------
>
> [ All other fetchers are blocked on the "outputPage" method, waiting 
> for the previous thread to free the lock ]
>
> ------------------------------------------------------------------------
>
> "fetcher7" prio=1 tid=0x084d85d0 nid=0x2ea6 waiting for monitor entry 
> [0x46b79000..0x46b79600]
>         at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:277) 
>
>         - waiting to lock <0x4c85d800> (a 
> org.apache.nutch.io.ArrayFile$Writer)
>         at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) 
>
>         at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
>
>
>

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to