Hi Michael,

> What other post-fetch actions are there?

Well, the fetched content is spilled to disk which may also become slow in 
pathological cases.

But I think it's more important to analyze what happened with the URLs before. 
The logs
should contain a message "fetching ..." for every hanging URL. When does it 
happen?

If possible, let us know about
- Nutch version
- environment (local, distributed)
- configuration, esp. if not the default:
    mapreduce.task.timeout
    fetcher.threads.tlimeout.divisor
    http.timeout
  and in doubt all other modified
    fetcher.*
  properties

Is the problem reproducible, or does it happen only sometimes?

Thanks,
Sebastian

On 12/09/2016 04:58 PM, Michael Coffey wrote:
> The property fetcher.parse is false and I pass -noParsing to the fetch 
> command. What other post-fetch actions are there?
> 
> 
>       From: Sebastian Nagel <[email protected]>
>  To: [email protected] 
>  Sent: Friday, December 9, 2016 12:58 AM
>  Subject: Re: Fetcher "hung while processing"
>    
> Hi Michael,
> 
> what about the property fetcher.parse ?
> 
> The queue is unblocked after a page has been fetched but before parsing.
> If the parser is hanging or one of the post-fetch actions take too long
> it may happen that there are multiple URLs from the same host still in
> process.
> 
> Sebastian
> 
> On 12/09/2016 02:15 AM, Michael Coffey wrote:
>> I sometimes get a bunch of warning messages that say Thread #x hung while 
>> processing <url>
>> Is this just a normal thing to see occasionally, or should I look to find 
>> some resolution? I do have an example where the same host shows up on a 
>> multitude of these messages, which puzzles me. I think there should be only 
>> one thread per host, due to me specifying fetcher.threads.per.queue=1
>> Here is example log showing the first 20 of 50 hung threads. Note that 
>> http://shinystat.com and http://fabulous.com show up more than once.
>>
>> 2016-12-09 00:47:29,559 WARN [main] org.apache.nutch.fetcher.Fetcher: 
>> Aborting with 50 hung threads.
>> 2016-12-09 00:47:29,560 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #0 hung while processing 
>> https://www.hugedomains.com/domain_search.cfm?catSearch=434
>> 2016-12-09 00:47:29,561 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #1 hung while processing 
>> http://fabulous.com/informationcenter/index.htm?formcode%5Bobjective%5D=&formcode%5Bevent%5D=&formcode%5Bregistrytime%5D=1481233769&formcode%5Bcertificate%5D=dfd737bc4490a09d4786cb0e87a15ba6&formdata%5Bqid%5D=820
>> 2016-12-09 00:47:29,561 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #2 hung while processing http://shinystat.com/it/pro/info_pro.html
>> 2016-12-09 00:47:29,561 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #3 hung while processing http://events.stanford.edu/byCategory/13/
>> 2016-12-09 00:47:29,562 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #4 hung while processing 
>> https://www.ladesk.com/pricing/hosted/terms-and-conditions/
>> 2016-12-09 00:47:29,562 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #5 hung while processing http://shinystat.com/en/opt-out_free.html
>> 2016-12-09 00:47:29,562 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #6 hung while processing http://shinystat.com/fr/biz/info_biz.html
>> 2016-12-09 00:47:29,562 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #7 hung while processing 
>> http://fabulous.com/informationcenter/index.htm?formcode%5Bobjective%5D=&formcode%5Bevent%5D=&formcode%5Bregistrytime%5D=1481233769&formcode%5Bcertificate%5D=dfd737bc4490a09d4786cb0e87a15ba6&formdata%5Bqid%5D=88
>> 2016-12-09 00:47:29,562 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #8 hung while processing http://www.youronlinechoices.com/sk/slovnik-pojmov
>> 2016-12-09 00:47:29,562 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #9 hung while processing https://twitter.com/sakura_ope
>> 2016-12-09 00:47:29,562 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #10 hung while processing http://europa.eu/european-union/topics/culture_en
>> 2016-12-09 00:47:29,562 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #11 hung while processing http://www.youronlinechoices.com/ee/opt-out-help
>> 2016-12-09 00:47:29,562 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #12 hung while processing 
>> https://www.hugedomains.com/domain_search.cfm?catSearch=437
>> 2016-12-09 00:47:29,562 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #13 hung while processing 
>> http://hosted.ap.org/dynamic/stories/U/US_OBIT_JOHN_GLENN?SITE=AP&SECTION=HOME&TEMPLATE=DEFAULT
>> 2016-12-09 00:47:29,563 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #14 hung while processing 
>> http://static.fc2.com/sh_css/common/base.css?1200605
>> 2016-12-09 00:47:29,563 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #15 hung while processing https://www.hugedomains.com/terms.cfm
>> 2016-12-09 00:47:29,563 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #16 hung while processing https://www.ladesk.com/comparisons/
>> 2016-12-09 00:47:29,563 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #17 hung while processing http://hu.statcounter.com/features/
>> 2016-12-09 00:47:29,563 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #18 hung while processing http://europa.eu/european-union/about-eu/working_el
>> 2016-12-09 00:47:29,563 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #19 hung while processing http://www.atinternet.com/es/recursos/
>> 2016-12-09 00:47:29,563 WARN [main] org.apache.nutch.fetcher.Fetcher: Thread 
>> #20 hung while processing http://ietf.org/rfc/rfc2026.txt
>>
>>
> 
> 
> 
>    
> 

Reply via email to