I had a similar pb once. I reduce my number of reduce task to 1.5 * nb of node and It solves my pb. I suggest to change your conf and run a fetch with max 36 reduce task.
> I have a very strange, reproducible bug that shows up when running > fetch across any number of documents >10000. I'm running 47 map tasks > and 47 reduce tasks on 24 nodes. The map phase finishes fine and so > does the majority of the reduce phase, however there are always two > segments that perpetually hang in the reduce > reduce phase. What > happens is the reducer gets to 85.xx% and then stops responding. Once > 10 minutes go by, a new worker starts the task, gets to the same > 85.xx(+/- .1%) and hangs. The other consistent part is that it's > always segment 2 and segment 5 (out of 47 segments). > > I figured I could fix it by simply copying data from a different > segment in and continuing on the next iteration, but low and behold > the same exact problem happens in segment 2 and segment 5. > > I assume it's not IO problems because all of the nodes involved in > these segments finish other reduce tasks in the same iteration with no > problems. Furthermore, I have seen this happen persistently over the > last many iterations. My last iteration had 400,000 (+/-) documents > pulled down and I saw the same behavior. > > Does anyone have any suggestions? > > -- > Ned Rockson > Discovery Engine > 795 Folsom Street > San Francisco, CA 94107 >
