[
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335418#comment-15335418
]
Wangda Tan commented on YARN-4844:
----------------------------------
[~kasha],
bq. getMemory is deprecated, but getVirtualCores is not
The reason why only update getMemory is it is the real problem. In the near
future, virtualCores is not likely go beyond max value of int. Considering size
of the patch, I only updated getMemory.
bq. getMemory is deprecated and recommends using getMemorySize, but
getMemorySize is unstable. Seems like the users are stuck between rock and a
hard place?
I was thinking this is the first release of the new API, probably we could
update it. I'm open update it to Evolving or even Stable API if you think it is
required.
bq. Is the recommendation to use the long version for everything - individual
resource-requests and variables that are used to capture aggregates? If yes,
shouldn't we update all current usages to the long version?
I've tried updated most of them, except few APIs (like mapreduce.JobStatus),
getMemory is used by YARN/MR 1k+ times, I believe there're missed places. I can
address them before release of 2.8.
bq. Also, do you think we can get this in 2.9 instead so we can be sure other
stuff doesn't break?
I would prefer to leave it in 2.8, this is the real problem that we saw a
couple of cases, and basically client can do nothing except restart services.
I've tried to build several YARN downstream projects such as Spark/Slider/Tez
against this patch, all of them can be built with the api fixes:
https://issues.apache.org/jira/secure/attachment/12810580/YARN-4844-branch-2.8.addendum.2.patch
Considering there're still 15+ pending blockers and critical issues for 2.8,
there're at least few weeks to finish 2.8, we can test more downstream projects
if you want.
bq. Also, noticed that some of the helper methods in Resources seem to using
getMemorySize for calculations but typecasting to int as in this example:
I will double check them as well as issues you found at YARN-5077. I plan to
create a new JIRA to address these issues instead of overloading this one.
> Add getMemorySize/getVirtualCoresSize to o.a.h.y.api.records.Resource
> ---------------------------------------------------------------------
>
> Key: YARN-4844
> URL: https://issues.apache.org/jira/browse/YARN-4844
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: api
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: YARN-4844-branch-2.8.0016_.patch,
> YARN-4844-branch-2.8.addendum.2.patch, YARN-4844-branch-2.addendum.1_.patch,
> YARN-4844-branch-2.addendum.2.patch, YARN-4844.1.patch, YARN-4844.10.patch,
> YARN-4844.11.patch, YARN-4844.12.patch, YARN-4844.13.patch,
> YARN-4844.14.patch, YARN-4844.15.patch, YARN-4844.16.branch-2.patch,
> YARN-4844.16.patch, YARN-4844.2.patch, YARN-4844.3.patch, YARN-4844.4.patch,
> YARN-4844.5.patch, YARN-4844.6.patch, YARN-4844.7.patch,
> YARN-4844.8.branch-2.patch, YARN-4844.8.patch, YARN-4844.9.branch,
> YARN-4844.9.branch-2.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending
> resources of running apps to cluster's total pending resources. If a
> problematic app requires too much resources (let's say 1M+ containers, each
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that
> there're many running apps, each of them has capped but still significant
> numbers of pending resources.
> So we may possibly need to add getMemoryLong/getVirtualCoreLong to
> o.a.h.y.api.records.Resource.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]