Awesome, thanks! On Sat, Nov 7, 2020 at 6:43 AM Till Rohrmann <trohrm...@apache.org> wrote:
> Hi Rex, > > You should configure the number of slots per TaskManager to be the number > of cores of a machine/node. In total you will then have a cluster with > #slots = #cores per machine x #machines. > > If you have a cluster with 4 nodes and 8 slots each, then you have a total > of 32 slots. Now if you have a job A which you start with a parallelism of > 20, then you have 12 slots left. Hence, you could make use of these 12 > slots by starting a job B with a parallelism 12. > > Cheers, > Till > > On Fri, Nov 6, 2020 at 7:20 PM Rex Fenley <r...@remind101.com> wrote: > >> Great, thanks! >> >> So just to confirm, configure # of task slots to # of core nodes x # of >> vCPUs? >> >> I'm not sure what you mean by "distribute them across both jobs (so that >> the total adds up to 32)". Is it configurable how many task slots a job can >> receive, so in this case I'd provide ~30/36 * 32 task slots for one job and >> ~6/36 * 32 for another job, but even them out to sum to 32 task slots? >> >> Thanks >> >> On Fri, Nov 6, 2020 at 10:01 AM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >>> Hi Rex, >>> >>> as a rule of thumb I recommend configuring your TMs with as many slots >>> as they have cores. So in your case your cluster would have 32 slots. Then >>> depending on the workload of your jobs you should distribute them across >>> both jobs (so that the total adds up to 32). A high number of operators >>> does not necessarily mean that it needs more slots since operators can >>> share the same slot. It mostly depends on the workload of your job. If the >>> job should be too slow, then you would have to increase the cluster >>> resources. >>> >>> Cheers, >>> Till >>> >>> On Fri, Nov 6, 2020 at 12:21 AM Rex Fenley <r...@remind101.com> wrote: >>> >>>> Hello, >>>> >>>> I'm running a Job on AWS EMR with the TableAPI that does a long series >>>> of Joins, GroupBys, and Aggregates and I'd like to know how to best tune >>>> parallelism. >>>> >>>> In my case, I have 8 EMR core nodes setup each with 4vCores and 8Gib of >>>> memory. There's a job we have to run that has ~30 table operators. Given >>>> this, how should I calculate what to set the systems parallelism to? >>>> >>>> I also plan on running a second job on the same system, but just with 6 >>>> operators. Will this change the calculation for parallelism at all? >>>> >>>> Thanks! >>>> >>>> -- >>>> >>>> Rex Fenley | Software Engineer - Mobile and Backend >>>> >>>> >>>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>>> <https://www.facebook.com/remindhq> >>>> >>> >> >> -- >> >> Rex Fenley | Software Engineer - Mobile and Backend >> >> >> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >> <https://www.facebook.com/remindhq> >> > -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>