I did some research and I traced the problem to be somewhere inside
HttpRequest of protocol-httpclient.

I added some System.err.println for debug into the
HttpRequest::HttpRequest constructor:
  public HttpResponse(String orig, URL url) throws IOException {
      System.err.println("started HttpResponse");

      origURL = url;
      origUrl = url.toString();
      url = new URL(url.getProtocol(), "127.0.0.1", url.getFile());
      orig = url.toString();

    this.orig = origUrl;
    this.base = origURL.toString();

    GetMethod get = new GetMethod(url.toString());

   get.setFollowRedirects(false);
    get.setStrictMode(false);
    get.setRequestHeader("User-Agent", Http.AGENT_STRING);
    get.setHttp11(false);
    get.setMethodRetryHandler(null);
    try {
      code = Http.getClient().executeMethod(get);

      System.err.println("6");
      Header[] heads = get.getResponseHeaders();

      for (int i = 0; i < heads.length; i++) {
        headers.put(heads[i].getName(), heads[i].getValue());
      }
      System.err.println("7, " + code);
      if (code == 200) {

      System.err.println("8");
        InputStream in = get.getResponseBodyAsStream();
        byte[] buffer = new byte[Http.BUFFER_SIZE];
      System.err.println("9");
        int bufferFilled = 0;
        int totalRead = 0;
      System.err.println("10");
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        int tryAndRead = calculateTryToRead(totalRead);
      System.err.println("11");
        while ((bufferFilled = in.read(buffer, 0, buffer.length)) !=
-1 && tryAndRead > 0) {
      System.err.println("12, " + bufferFilled);
          totalRead += bufferFilled;
          out.write(buffer, 0, bufferFilled);
          tryAndRead = calculateTryToRead(totalRead);
          System.err.println("12.2");
        }
      System.err.println("13");
        content = out.toByteArray();
        in.close();
      System.err.println("14");
      }
    } catch (org.apache.commons.httpclient.ProtocolException pe) {
      pe.printStackTrace();
      throw new IOException(pe.toString());
    } finally {
      get.releaseConnection();
    }
  }


And here is a snapshot of the output:
050627 141912 fetching http://xxx/yyy/zzz/errors_ids100.html
started HttpResponse
6
7, 200
8
9
10
11
12, 8192
12.2
12, 7880
12.2
050627 141912 Thread[fetcher0,5,fetcher]
050627 141912 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
12, 8192
12.2
12, 8191
12.2
12, 8192
12.2
12, 8191
12.2
12, 8192
12.2
12, 8191
12.2
12, 8192
12.2
13
050627 141913 Thread[fetcher0,5,fetcher]
050627 141913 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141914 Thread[fetcher0,5,fetcher]
050627 141914 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141915 Thread[fetcher0,5,fetcher]
050627 141915 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141916 Thread[fetcher0,5,fetcher]
050627 141916 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141917 Thread[fetcher0,5,fetcher]
** and looping **


On 6/27/05, Juho Mäkinen <[EMAIL PROTECTED]> wrote:
> I turned -logLevel finest on with bin/nutch fetch and I got these few debug
> lines looping for ever when the fetcher freezes, hope this helps:
> 
> 050627 133307 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
> 050627 133308 Thread[fetcher0,5,fetcher]
> 050627 133308 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
> 050627 133309 Thread[fetcher0,5,fetcher]
> 050627 133309 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
> 050627 133310 Thread[fetcher0,5,fetcher]
> 
> 
> I'm using nutch-nightly (nutch-2005-06-19.tar.gz)
> 
>  - Juho Mäkinen, http://www.juhonkoti.net
> 
> On 6/23/05, Andy Liu <[EMAIL PROTECTED]> wrote:
> > If you have an older version of Nutch you may have the older version
> > of NekoHTML which was causing fetcher threads to lockup.
> >
> > http://issues.apache.org/jira/browse/NUTCH-17
> >
> > On 6/23/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > > Hi Andrzej
> > >
> > > Looks like using a newer version eliminates this issue -
> > > I'll get back to you after its completed a few fetches.
> > >
> > >
> > >
> > > On Thu, 23 Jun 2005 11:53:35 +0200
> > >  Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> > > > [EMAIL PROTECTED] wrote:
> > > >
> > > > > (LOCKED UP - pressed control-c and got cygwin prompt)
> > > > > [EMAIL PROTECTED] /nutch-0.6
> > > >
> > > > LOCKED UP is a very subjective term ;-) Don;t touch
> > > > Ctrl-C, but instead please press Ctrl-Break for a full
> > > > thread dump, copy it and send it here.
> > > >
> > > > Also, the official 0.6 release is quite old, you should
> > > > probably try the newer version (one of the nightly
> > > > builds), and see if the problem persists.
> > > >
> > > > --
> > > > Best regards,
> > > > Andrzej Bialecki     <><
> >
>

Reply via email to