RE: Issues related to Hung threads when crawling more than 15K articles

2018-04-04 Thread Markus Jelsma
That doesn't appear to be the case, fetcher's time bomb nicely logs when it 
reached its limit, it also usually runs for longer than two seconds which we 
see here.

What can you find in the logs? There must be some error beyond having hung 
threads. Usually something with a hanging parser or GC issues.

Markus

 
 
-Original message-
> From:Yossi Tamari <yossi.tam...@pipl.com>
> Sent: Wednesday 4th April 2018 11:37
> To: user@nutch.apache.org; shivakarthik...@gmail.com
> Cc: 'Sebastian Nagel' <wastl.na...@googlemail.com>
> Subject: RE: Issues related to Hung threads when crawling more than 15K 
> articles
> 
> I believe this is normal behaviour. The fetch timeout which you have defined 
> (fetcher.timelimit.mins) has passed, and the fetcher is exiting. In this case 
> one of the fetcher threads is still waiting for a response from a specific 
> URL. This is not a problem, and any URLs which were not fetched because of 
> the timeout will be "generated" again in a future segment.
> You do want to try to match the fetcher timeout and the generated segment 
> size, but you can never be 100% successful, and that's not a problem.
> 
>   Yossi.
> 
> > -Original Message-
> > From: ShivaKarthik S <shivakarthik...@gmail.com>
> > Sent: 04 April 2018 12:32
> > To: user@nutch.apache.org
> > Cc: Sebastian Nagel <wastl.na...@googlemail.com>
> > Subject: Reg: Issues related to Hung threads when crawling more than 15K
> > articles
> > 
> > Hi,
> > 
> >I am crawling 25K+ artilces at a time (in single depth), but after 
> > crawling (using
> > nutch-1.11) certain amount of articles am getting error related to Hung 
> > threads
> > and the process gets killed. Can some one suggest me a solution to resolve 
> > this?
> > 
> > *Error am getting is as follows*
> > 
> > Fetcher: throughput threshold: -1
> > Fetcher: throughput threshold retries: 5 -activeThreads=10, spinWaiting=9,
> > fetchQueues.totalSize=2,
> > fetchQueues.getQueueCount=1
> > Aborting with 10 hung threads.
> > Thread #0 hung while processing
> > https://24.kg/sport/29754_kyirgyizstantsyi_vyiigrali_dva_boya_na_litsenzionno
> > m_turnire_po_boksu_v_kitae/
> > Thread #1 hung while processing null
> > Thread #2 hung while processing null
> > Thread #3 hung while processing null
> > Thread #4 hung while processing null
> > Thread #5 hung while processing null
> > Thread #6 hung while processing null
> > Thread #7 hung while processing null
> > Thread #8 hung while processing null
> > Thread #9 hung while processing null
> > Fetcher: finished at 2018-04-04 14:23:45, elapsed: 00:00:02
> > 
> > --
> > Thanks and Regards
> > Shiva
> 
> 


RE: Issues related to Hung threads when crawling more than 15K articles

2018-04-04 Thread Yossi Tamari
I believe this is normal behaviour. The fetch timeout which you have defined 
(fetcher.timelimit.mins) has passed, and the fetcher is exiting. In this case 
one of the fetcher threads is still waiting for a response from a specific URL. 
This is not a problem, and any URLs which were not fetched because of the 
timeout will be "generated" again in a future segment.
You do want to try to match the fetcher timeout and the generated segment size, 
but you can never be 100% successful, and that's not a problem.

Yossi.

> -Original Message-
> From: ShivaKarthik S 
> Sent: 04 April 2018 12:32
> To: user@nutch.apache.org
> Cc: Sebastian Nagel 
> Subject: Reg: Issues related to Hung threads when crawling more than 15K
> articles
> 
> Hi,
> 
>I am crawling 25K+ artilces at a time (in single depth), but after 
> crawling (using
> nutch-1.11) certain amount of articles am getting error related to Hung 
> threads
> and the process gets killed. Can some one suggest me a solution to resolve 
> this?
> 
> *Error am getting is as follows*
> 
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold retries: 5 -activeThreads=10, spinWaiting=9,
> fetchQueues.totalSize=2,
> fetchQueues.getQueueCount=1
> Aborting with 10 hung threads.
> Thread #0 hung while processing
> https://24.kg/sport/29754_kyirgyizstantsyi_vyiigrali_dva_boya_na_litsenzionno
> m_turnire_po_boksu_v_kitae/
> Thread #1 hung while processing null
> Thread #2 hung while processing null
> Thread #3 hung while processing null
> Thread #4 hung while processing null
> Thread #5 hung while processing null
> Thread #6 hung while processing null
> Thread #7 hung while processing null
> Thread #8 hung while processing null
> Thread #9 hung while processing null
> Fetcher: finished at 2018-04-04 14:23:45, elapsed: 00:00:02
> 
> --
> Thanks and Regards
> Shiva