*in the morning On Mon, Jul 27, 2015, 6:45 PM David Ortiz <[email protected]> wrote:
> I'll give that a try on your morning. Thanks. > > On Mon, Jul 27, 2015, 6:02 PM Josh Wills <[email protected]> wrote: > >> Hey David, >> >> The easiest way is to insert a PCollection.cache() call at the stage >> between the two joins where you think the reduce phase should end and the >> next map phase should begin. When the Crunch planner makes the decision of >> where to split the work between a reducer/mapper, it tries to respect any >> explicit cache() calls that it encounters. >> >> Josh >> >> On Mon, Jul 27, 2015 at 2:58 PM, David Ortiz <[email protected]> >> wrote: >> >>> Hey, >>> >>> >>> >>> Are there any easy tricks to force a new map stage to kick off? I >>> know I can force a reduce with GBK operations, but I am running into an >>> issue where one of our jobs is having issues with data skew, and from what >>> I can tell, the issue is we are getting a couple hot keys that join >>> properly, but then when trying to do the follow up processing that comes >>> before the next join, the reducer hits the GC Overhead Limit. Based on the >>> dot file, it is trying to do all the preprocessing for the next join in the >>> reducer from the first join, but it could easily do it in the map phase >>> before the next join in the pipeline without any issues, and I think this >>> would also get past the issue we’re having with memory. The only solution >>> I could think of to try and do this at the moment, is to do everything up >>> to the first join, call pipeline.done(), then add some more operations >>> before another pipeline.done() operation. >>> >>> >>> >>> Thanks, >>> >>> Dave >>> *This email is intended only for the use of the individual(s) to whom >>> it is addressed. If you have received this communication in error, please >>> immediately notify the sender and delete the original email.* >>> >> >> >> >> -- >> Director of Data Science >> Cloudera <http://www.cloudera.com> >> Twitter: @josh_wills <http://twitter.com/josh_wills> >> >
