Awesome, thanks!

On Sat, Nov 7, 2020 at 6:43 AM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Rex,
>
> You should configure the number of slots per TaskManager to be the number
> of cores of a machine/node. In total you will then have a cluster with
> #slots = #cores per machine x #machines.
>
> If you have a cluster with 4 nodes and 8 slots each, then you have a total
> of 32 slots. Now if you have a job A which you start with a parallelism of
> 20, then you have 12 slots left. Hence, you could make use of these 12
> slots by starting a job B with a parallelism 12.
>
> Cheers,
> Till
>
> On Fri, Nov 6, 2020 at 7:20 PM Rex Fenley <r...@remind101.com> wrote:
>
>> Great, thanks!
>>
>> So just to confirm, configure # of task slots to # of core nodes x # of
>> vCPUs?
>>
>> I'm not sure what you mean by "distribute them across both jobs (so that
>> the total adds up to 32)". Is it configurable how many task slots a job can
>> receive, so in this case I'd provide ~30/36 * 32 task slots for one job and
>> ~6/36 * 32 for another job, but even them out to sum to 32 task slots?
>>
>> Thanks
>>
>> On Fri, Nov 6, 2020 at 10:01 AM Till Rohrmann <trohrm...@apache.org>
>> wrote:
>>
>>> Hi Rex,
>>>
>>> as a rule of thumb I recommend configuring your TMs with as many slots
>>> as they have cores. So in your case your cluster would have 32 slots. Then
>>> depending on the workload of your jobs you should distribute them across
>>> both jobs (so that the total adds up to 32). A high number of operators
>>> does not necessarily mean that it needs more slots since operators can
>>> share the same slot. It mostly depends on the workload of your job. If the
>>> job should be too slow, then you would have to increase the cluster
>>> resources.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Fri, Nov 6, 2020 at 12:21 AM Rex Fenley <r...@remind101.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm running a Job on AWS EMR with the TableAPI that does a long series
>>>> of Joins, GroupBys, and Aggregates and I'd like to know how to best tune
>>>> parallelism.
>>>>
>>>> In my case, I have 8 EMR core nodes setup each with 4vCores and 8Gib of
>>>> memory. There's a job we have to run that has ~30 table operators. Given
>>>> this, how should I calculate what to set the systems parallelism to?
>>>>
>>>> I also plan on running a second job on the same system, but just with 6
>>>> operators. Will this change the calculation for parallelism at all?
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>>
>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>
>>>>
>>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>>> <https://www.facebook.com/remindhq>
>>>>
>>>
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>> <https://www.facebook.com/remindhq>
>>
>

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to