[ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257158#comment-15257158
 ] 

Wangda Tan commented on YARN-4844:
----------------------------------

bq. ...Given the debate about the extent of the changes we want to make, can we 
put a patch that changes the int32 to int64, adds getMemoryLong with a Private 
annotation(so that we can make changes later if we wish) and only fixes the 
pending memory check that was added in 2.8?...
I agree size of the patch looks scary :-p, however, if you look into the patch, 
they're all very simple fixes, I don't think it will cause a lot of issues. You 
may feel better once I fixed all Jenkins issues.
I have considered fix the pending resource calculation only, it looks hard to 
me. Because calculation of pending resource uses 
ResourceCalculator/ResourceUsage. And ResourceCalculator and related static 
methods of Resources used everywhere in RM.
It's a good idea to me to mark get___Long to @Private, currently pending 
resource hasn't been exposed to application via Java API yet. Now it is only 
exposed in REST API which is fixed by the patch already.

Thoughts?

> Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
> ------------------------------------------------------------------
>
>                 Key: YARN-4844
>                 URL: https://issues.apache.org/jira/browse/YARN-4844
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Blocker
>         Attachments: YARN-4844.1.patch, YARN-4844.2.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending 
> resources of running apps to cluster's total pending resources. If a 
> problematic app requires too much resources (let's say 1M+ containers, each 
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that 
> there're many running apps, each of them has capped but still significant 
> numbers of pending resources.
> So we may possibly need to upgrade int32 memory field (could include v-cores 
> as well) to int64 to avoid integer overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to