[ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4844:
-----------------------------
    Description: 
We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
memory, we will get a negative total cluster memory.

And another case that easier overflows int32 is: we added all pending resources 
of running apps to cluster's total pending resources. If a problematic app 
requires too much resources (let's say 1M+ containers, each of them has 3G 
containers), int32 will be not enough.

Even if we can cap each app's pending request, we cannot handle the case that 
there're many running apps, each of them has capped but still significant 
numbers of pending resources.

So we may possibly need to add getMemoryLong/getVirtualCoreLong to 

  was:
We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
memory, we will get a negative total cluster memory.

And another case that easier overflows int32 is: we added all pending resources 
of running apps to cluster's total pending resources. If a problematic app 
requires too much resources (let's say 1M+ containers, each of them has 3G 
containers), int32 will be not enough.

Even if we can cap each app's pending request, we cannot handle the case that 
there're many running apps, each of them has capped but still significant 
numbers of pending resources.

So we may possibly need to upgrade int32 memory field (could include v-cores as 
well) to int64 to avoid integer overflow. 


> Add getMemoryLong/getVirtualCoreLong to o.a.h.y.api.records.Resource
> --------------------------------------------------------------------
>
>                 Key: YARN-4844
>                 URL: https://issues.apache.org/jira/browse/YARN-4844
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Blocker
>         Attachments: YARN-4844.1.patch, YARN-4844.2.patch, YARN-4844.3.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending 
> resources of running apps to cluster's total pending resources. If a 
> problematic app requires too much resources (let's say 1M+ containers, each 
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that 
> there're many running apps, each of them has capped but still significant 
> numbers of pending resources.
> So we may possibly need to add getMemoryLong/getVirtualCoreLong to 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to