I did some research and I traced the problem to be somewhere inside
HttpRequest of protocol-httpclient.
I added some System.err.println for debug into the
HttpRequest::HttpRequest constructor:
public HttpResponse(String orig, URL url) throws IOException {
System.err.println("started HttpResponse");
origURL = url;
origUrl = url.toString();
url = new URL(url.getProtocol(), "127.0.0.1", url.getFile());
orig = url.toString();
this.orig = origUrl;
this.base = origURL.toString();
GetMethod get = new GetMethod(url.toString());
get.setFollowRedirects(false);
get.setStrictMode(false);
get.setRequestHeader("User-Agent", Http.AGENT_STRING);
get.setHttp11(false);
get.setMethodRetryHandler(null);
try {
code = Http.getClient().executeMethod(get);
System.err.println("6");
Header[] heads = get.getResponseHeaders();
for (int i = 0; i < heads.length; i++) {
headers.put(heads[i].getName(), heads[i].getValue());
}
System.err.println("7, " + code);
if (code == 200) {
System.err.println("8");
InputStream in = get.getResponseBodyAsStream();
byte[] buffer = new byte[Http.BUFFER_SIZE];
System.err.println("9");
int bufferFilled = 0;
int totalRead = 0;
System.err.println("10");
ByteArrayOutputStream out = new ByteArrayOutputStream();
int tryAndRead = calculateTryToRead(totalRead);
System.err.println("11");
while ((bufferFilled = in.read(buffer, 0, buffer.length)) !=
-1 && tryAndRead > 0) {
System.err.println("12, " + bufferFilled);
totalRead += bufferFilled;
out.write(buffer, 0, bufferFilled);
tryAndRead = calculateTryToRead(totalRead);
System.err.println("12.2");
}
System.err.println("13");
content = out.toByteArray();
in.close();
System.err.println("14");
}
} catch (org.apache.commons.httpclient.ProtocolException pe) {
pe.printStackTrace();
throw new IOException(pe.toString());
} finally {
get.releaseConnection();
}
}
And here is a snapshot of the output:
050627 141912 fetching http://xxx/yyy/zzz/errors_ids100.html
started HttpResponse
6
7, 200
8
9
10
11
12, 8192
12.2
12, 7880
12.2
050627 141912 Thread[fetcher0,5,fetcher]
050627 141912 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
12, 8192
12.2
12, 8191
12.2
12, 8192
12.2
12, 8191
12.2
12, 8192
12.2
12, 8191
12.2
12, 8192
12.2
13
050627 141913 Thread[fetcher0,5,fetcher]
050627 141913 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141914 Thread[fetcher0,5,fetcher]
050627 141914 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141915 Thread[fetcher0,5,fetcher]
050627 141915 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141916 Thread[fetcher0,5,fetcher]
050627 141916 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141917 Thread[fetcher0,5,fetcher]
** and looping **
On 6/27/05, Juho Mäkinen <[EMAIL PROTECTED]> wrote:
> I turned -logLevel finest on with bin/nutch fetch and I got these few debug
> lines looping for ever when the fetcher freezes, hope this helps:
>
> 050627 133307 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
> 050627 133308 Thread[fetcher0,5,fetcher]
> 050627 133308 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
> 050627 133309 Thread[fetcher0,5,fetcher]
> 050627 133309 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
> 050627 133310 Thread[fetcher0,5,fetcher]
>
>
> I'm using nutch-nightly (nutch-2005-06-19.tar.gz)
>
> - Juho Mäkinen, http://www.juhonkoti.net
>
> On 6/23/05, Andy Liu <[EMAIL PROTECTED]> wrote:
> > If you have an older version of Nutch you may have the older version
> > of NekoHTML which was causing fetcher threads to lockup.
> >
> > http://issues.apache.org/jira/browse/NUTCH-17
> >
> > On 6/23/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > > Hi Andrzej
> > >
> > > Looks like using a newer version eliminates this issue -
> > > I'll get back to you after its completed a few fetches.
> > >
> > >
> > >
> > > On Thu, 23 Jun 2005 11:53:35 +0200
> > > Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> > > > [EMAIL PROTECTED] wrote:
> > > >
> > > > > (LOCKED UP - pressed control-c and got cygwin prompt)
> > > > > [EMAIL PROTECTED] /nutch-0.6
> > > >
> > > > LOCKED UP is a very subjective term ;-) Don;t touch
> > > > Ctrl-C, but instead please press Ctrl-Break for a full
> > > > thread dump, copy it and send it here.
> > > >
> > > > Also, the official 0.6 release is quite old, you should
> > > > probably try the newer version (one of the nightly
> > > > builds), and see if the problem persists.
> > > >
> > > > --
> > > > Best regards,
> > > > Andrzej Bialecki <><
> >
>