Hi,

On 9/6/07, Ned Rockson <[EMAIL PROTECTED]> wrote:
> (sorry if this is a repost, I'm not sure if it sent last time).
>
> I have a very strange, reproducible bug that shows up when running
> fetch across any number of documents >10000.  I'm running 47 map tasks
> and 47 reduce tasks on 24 nodes.  The map phase finishes fine and so
> does the majority of the reduce phase, however there are always two
> segments that perpetually hang in the reduce > reduce phase.  What
> happens is the reducer gets to 85.xx% and then stops responding.  Once
> 10 minutes go by, a new worker starts the task, gets to the same
> 85.xx(+/- .1%) and hangs.  The other consistent part is that it's
> always segment 2 and segment 5 (out of 47 segments).
>
> I figured I could fix it by simply copying data from a different
> segment in and continuing on the next iteration, but low and behold
> the same exact problem happens in segment 2 and segment 5.
>
> I assume it's not IO problems because all of the nodes involved in
> these segments finish other reduce tasks in the same iteration with no
> problems.  Furthermore, I have seen this happen persistently over the
> last many iterations.  My last iteration had 400,000 (+/-) documents
> pulled down and I saw the same behavior.
>
> Does anyone have any suggestions?

Fetcher doesn't do anything interesting in reduce (after all, it is
just IdentityReducer) so this is very strange.

You may try adding some debug statements to write method in
FetcherOutputFormat (if you are using trunk, write method is at line
~84), and try to figure out if it is consistently getting stuck at a
(group of) particular url(s). If it always hangs on the same url, try
fetching that url alone and see if it still doesn't work.


>
> --
> Ned Rockson
> Discovery Engine
> 795 Folsom Street
> San Francisco, CA 94107
>


-- 
Doğacan Güney

Reply via email to