I will start taking a look at some thread dumps.  It is not the sorting 
phase.  It gets past the sort and gets through part of the reduce phase 
(and always the same percentage, when the job is restarts on the same 
machine it gets to the same part again before stalling again).  And this 
is happening on multiple machines so I do think it is a machine 
problem.  Again I need to spend some time looking through thread dumps.

Dennis

Andrzej Bialecki wrote:
> Dennis Kubes wrote:
>> Do you think it is the parsing that is causing it?
>
> Just checking ... probably not. You could figure out from a thread 
> dump where it's spending time.
>
>
>> I was looking at a smaller fetching run and the cpu gets pushed to 
>> 100% as well but the reports keep happening.  This only seems to 
>> happen when I run very large fetches (> 500K pages).  I just ran a 
>> 100K fetch and it worked just fine.  Should I have some special 
>> settings for larger fetches?
>
> You could try tweaking the io.sort values, if it times out during the 
> sorting phase.
>


_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to