Ken Krugler wrote:
We're only using the html & text parsers, so I don't think that's the problem. Plus we dumping the thread stack when it hangs, and it's always in the ChunkedInputStream.exhaustInputStream() process (see trace below).

The trace did not make it.

Oops - see at the end of this email.

Have you tried protocol-http instead of protocol-httpclient?

No, not yet. Andrzej also suggested this.

Is it any better?

I'll give it a try & report back.

What JVM are you running?

Java version "1.4.2_09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_09-b05)
Java HotSpot(TM) Client VM (build 1.4.2_09-b05, mixed mode)

I get fewer socket hangs in 1.5 than 1.4.

I'll see if we can update our server to 1.5, thanks!

Also, the mapred fetcher has been changed to succeed even when threads hang. Perhaps we should change the 0.7 fetcher similarly? I think we should probably go even farther, and kill threads which take longer than a timeout to process a url. Thread.stop() is theoretically unsafe, but I've used it in the past for this sort of thing and never traced subsequent problems back to it...

I thought the issue with Thread.stop() is that it won't interrupt a hung java.io read, and that's why java.nio (which is interruptible) is preferred.

But from what I'm seeing, Thread.stop() should work, since there is a trickle of data coming in from the remote host, and thus the read calls should be returning.

I'll give this a try as well.

Thanks,

-- Ken

=====================================================================
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:183)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:222)
at java.io.BufferedInputStream.read(BufferedInputStream.java:277)
- locked <0x27252050> (a java.io.BufferedInputStream)
at org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:169) at org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:183) at org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:368) at org.apache.commons.httpclient.ContentLengthInputStream.close(ContentLengthInputStream.java:117)
at java.io.FilterInputStream.close(FilterInputStream.java:159)
at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:176) at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:140) at org.apache.nutch.protocol.httpclient.HttpResponse.runResponse(HttpResponse.java:159) at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:97)
at org.apache.nutch.protocol.httpclient.Http.getProtocolOutput(Http.java:222)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:150)

--
Ken Krugler
Krugle, Inc.
+1 530-470-9200

Reply via email to