It is the latter approach, yes. The former would be better. J
On Mon, Oct 12, 2015 at 3:56 PM, Everett Anderson <[email protected]> wrote: > Hey Josh, > > Somewhat related question -- when computing the number of reducers, is the > planner doing that at the start of each MR job, estimating the size of the > map output and then calculating number of reducers based on the input data > size going into the job? > > Or does it make the calculation at the very beginning of the pipeline > after reading the sources? > > The former might be more accurate, with the latter suffering a compounding > effect from poor estimation at any step. > > > > On Mon, Oct 12, 2015 at 3:46 PM, Josh Wills <[email protected]> wrote: > >> No, just the number of tasks involved in each job. The structure should >> remain the same. >> >> J >> >> On Mon, Oct 12, 2015 at 3:44 PM, Ravi Kolluri <[email protected]> wrote: >> >>> >>> Thanks Josh! >>> >>> My question was more about how the planner organizes the map-reduce >>> computation. Would the crunch job composition change based on input size? >>> >>> thanks, >>> Ravi >>> >>> >>> On Mon, Oct 12, 2015 at 3:38 PM, Josh Wills <[email protected]> >>> wrote: >>> >>>> Hey Ravi, >>>> >>>> The number of reducers used in the various stages of the MR job can >>>> change if you don't hard-code them using groupByKey(int numReducers) or >>>> groupByKey(GroupingOptions) (or the equivalent settings via the >>>> JoinStrategy classes for joins). The planner will try to estimate the >>>> number of bytes to be processed and aims to process 1GB of data per >>>> reducer. If you do hard-code the number of reduce tasks, the planner will >>>> respect your wishes no matter what the input size is. >>>> >>>> Josh >>>> >>>> On Mon, Oct 12, 2015 at 2:31 PM, Ravi Kolluri <[email protected]> wrote: >>>> >>>>> Hello Crunch users, >>>>> >>>>> I have a question about what parameters go into the Crunch planner. >>>>> >>>>> Lets say I have a crunch job with a set of input tables, and a fixed >>>>> set of calls to parallelDo and groupBy operations. Does the crunch >>>>> execution plan stay fixed independent of the size distribution of the >>>>> inputs? >>>>> >>>>> thanks, >>>>> Ravi >>>>> >>>>> >>>>> *DISCLAIMER:* The contents of this email, including any attachments, >>>>> may contain information that is confidential, proprietary in nature, >>>>> protected health information (PHI), or otherwise protected by law from >>>>> disclosure, and is solely for the use of the intended recipient(s). If you >>>>> are not the intended recipient, you are hereby notified that any use, >>>>> disclosure or copying of this email, including any attachments, is >>>>> unauthorized and strictly prohibited. If you have received this email in >>>>> error, please notify the sender of this email. Please delete this and all >>>>> copies of this email from your system. Any opinions either expressed or >>>>> implied in this email and all attachments, are those of its author only, >>>>> and do not necessarily reflect those of Nuna Health, Inc. >>>> >>>> >>>> >>> >>> *DISCLAIMER:* The contents of this email, including any attachments, >>> may contain information that is confidential, proprietary in nature, >>> protected health information (PHI), or otherwise protected by law from >>> disclosure, and is solely for the use of the intended recipient(s). If you >>> are not the intended recipient, you are hereby notified that any use, >>> disclosure or copying of this email, including any attachments, is >>> unauthorized and strictly prohibited. If you have received this email in >>> error, please notify the sender of this email. Please delete this and all >>> copies of this email from your system. Any opinions either expressed or >>> implied in this email and all attachments, are those of its author only, >>> and do not necessarily reflect those of Nuna Health, Inc. >>> >> >> > > *DISCLAIMER:* The contents of this email, including any attachments, may > contain information that is confidential, proprietary in nature, protected > health information (PHI), or otherwise protected by law from disclosure, > and is solely for the use of the intended recipient(s). If you are not the > intended recipient, you are hereby notified that any use, disclosure or > copying of this email, including any attachments, is unauthorized and > strictly prohibited. If you have received this email in error, please > notify the sender of this email. Please delete this and all copies of this > email from your system. Any opinions either expressed or implied in this > email and all attachments, are those of its author only, and do not > necessarily reflect those of Nuna Health, Inc. >
