I believe this is normal behaviour. The fetch timeout which you have defined 
(fetcher.timelimit.mins) has passed, and the fetcher is exiting. In this case 
one of the fetcher threads is still waiting for a response from a specific URL. 
This is not a problem, and any URLs which were not fetched because of the 
timeout will be "generated" again in a future segment.
You do want to try to match the fetcher timeout and the generated segment size, 
but you can never be 100% successful, and that's not a problem.

        Yossi.

> -----Original Message-----
> From: ShivaKarthik S <shivakarthik...@gmail.com>
> Sent: 04 April 2018 12:32
> To: user@nutch.apache.org
> Cc: Sebastian Nagel <wastl.na...@googlemail.com>
> Subject: Reg: Issues related to Hung threads when crawling more than 15K
> articles
> 
> Hi,
> 
>    I am crawling 25K+ artilces at a time (in single depth), but after 
> crawling (using
> nutch-1.11) certain amount of articles am getting error related to Hung 
> threads
> and the process gets killed. Can some one suggest me a solution to resolve 
> this?
> 
> *Error am getting is as follows*
> 
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold retries: 5 -activeThreads=10, spinWaiting=9,
> fetchQueues.totalSize=20000,
> fetchQueues.getQueueCount=1
> Aborting with 10 hung threads.
> Thread #0 hung while processing
> https://24.kg/sport/29754_kyirgyizstantsyi_vyiigrali_dva_boya_na_litsenzionno
> m_turnire_po_boksu_v_kitae/
> Thread #1 hung while processing null
> Thread #2 hung while processing null
> Thread #3 hung while processing null
> Thread #4 hung while processing null
> Thread #5 hung while processing null
> Thread #6 hung while processing null
> Thread #7 hung while processing null
> Thread #8 hung while processing null
> Thread #9 hung while processing null
> Fetcher: finished at 2018-04-04 14:23:45, elapsed: 00:00:02
> 
> --
> Thanks and Regards
> Shiva

Reply via email to