I had a similar pb once. I reduce my number of reduce task to 1.5 * nb of
node and It solves my pb.
I suggest to change your conf and run a fetch with max 36 reduce task.

> I have a very strange, reproducible bug that shows up when running
> fetch across any number of documents >10000.  I'm running 47 map tasks
> and 47 reduce tasks on 24 nodes.  The map phase finishes fine and so
> does the majority of the reduce phase, however there are always two
> segments that perpetually hang in the reduce > reduce phase.  What
> happens is the reducer gets to 85.xx% and then stops responding.  Once
> 10 minutes go by, a new worker starts the task, gets to the same
> 85.xx(+/- .1%) and hangs.  The other consistent part is that it's
> always segment 2 and segment 5 (out of 47 segments).
>
> I figured I could fix it by simply copying data from a different
> segment in and continuing on the next iteration, but low and behold
> the same exact problem happens in segment 2 and segment 5.
>
> I assume it's not IO problems because all of the nodes involved in
> these segments finish other reduce tasks in the same iteration with no
> problems.  Furthermore, I have seen this happen persistently over the
> last many iterations.  My last iteration had 400,000 (+/-) documents
> pulled down and I saw the same behavior.
>
> Does anyone have any suggestions?
>
> --
> Ned Rockson
> Discovery Engine
> 795 Folsom Street
> San Francisco, CA 94107
>

Reply via email to