[
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257158#comment-15257158
]
Wangda Tan commented on YARN-4844:
----------------------------------
bq. ...Given the debate about the extent of the changes we want to make, can we
put a patch that changes the int32 to int64, adds getMemoryLong with a Private
annotation(so that we can make changes later if we wish) and only fixes the
pending memory check that was added in 2.8?...
I agree size of the patch looks scary :-p, however, if you look into the patch,
they're all very simple fixes, I don't think it will cause a lot of issues. You
may feel better once I fixed all Jenkins issues.
I have considered fix the pending resource calculation only, it looks hard to
me. Because calculation of pending resource uses
ResourceCalculator/ResourceUsage. And ResourceCalculator and related static
methods of Resources used everywhere in RM.
It's a good idea to me to mark get___Long to @Private, currently pending
resource hasn't been exposed to application via Java API yet. Now it is only
exposed in REST API which is fixed by the patch already.
Thoughts?
> Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
> ------------------------------------------------------------------
>
> Key: YARN-4844
> URL: https://issues.apache.org/jira/browse/YARN-4844
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: api
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Priority: Blocker
> Attachments: YARN-4844.1.patch, YARN-4844.2.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending
> resources of running apps to cluster's total pending resources. If a
> problematic app requires too much resources (let's say 1M+ containers, each
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that
> there're many running apps, each of them has capped but still significant
> numbers of pending resources.
> So we may possibly need to upgrade int32 memory field (could include v-cores
> as well) to int64 to avoid integer overflow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)