Hey,

     Are there any easy tricks to force a new map stage to kick off?  I know I 
can force a reduce with GBK operations, but I am running into an issue where 
one of our jobs is having issues with data skew, and from what I can tell, the 
issue is we are getting a couple hot keys that join properly, but then when 
trying to do the follow up processing that comes before the next join, the 
reducer hits the GC Overhead Limit.  Based on the dot file, it is trying to do 
all the preprocessing for the next join in the reducer from the first join, but 
it could easily do it in the map phase before the next join in the pipeline 
without any issues, and I think this would also get past the issue we're having 
with memory.  The only solution I could think of to try and do this at the 
moment, is to do everything up to the first join, call pipeline.done(), then 
add some more operations before another pipeline.done() operation.

Thanks,
    Dave
This email is intended only for the use of the individual(s) to whom it is 
addressed. If you have received this communication in error, please immediately 
notify the sender and delete the original email.

Reply via email to