Hi, I have programmatically specified setNumReduceTasks(16) in MeanShiftCanopyDriver.java. On execution the number of reducers is being set correctly (i.e. 16 as visible on jobtracker screen) but on digging deeper I see that one node has maximum number of bytes to process and it is nominal for rest of the nodes. Hence the reduce phase is very slow after 98% completion.
I am trying this on a cluster of 18 nodes. I also see that load is distributed evenly in map phase but not in reduce. This is happening on 0.4 and 0.5 versions of Mahout. Has anyone faced such a problem and how to get around it? Thanks a lot in advance, Sohini ________________________________ Important notice: This e-mail and any attachment there to contains corporate proprietary information. If you have received it by mistake, please notify us immediately by reply e-mail and delete this e-mail and its attachments from your system. Thank You.
