I'm following your advice and refetching the segments without parsing. 

I'm fetching these segments with 1 concurrent thread (three segments: 13,
14, and 27 thousand pages) so it will be awhile before they finish. I'll
email you in case anything goes wrong.

Regards,
EM

-----Original Message-----
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 19, 2005 9:51 AM
To: [email protected]
Subject: Re: Nutch 0.7 released

EM wrote:
> The only difference from the previous configuration was that I enabled the
> "js" parser. However, one crash happened at a pdf file and I don't know
> about the other one. Unfortunately, the urls were not saved at the time.

The thread dumps that you sent don't point immediately to anything 
obvious. There are always two threads waiting somewhere in httpclient 
code, waiting for connection release, but that's normal, it's the way it 
works.

Please try to run your fetches (on the same segments that hung before, 
just delete everything except for fetchlist/), but this time with 
-noParsing option. This way we should be able to eliminate problems 
related to parsing.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Reply via email to