[jira] [Commented] (YARN-4844) Add getMemorySize/getVirtualCoresSize to o.a.h.y.api.records.Resource

Wangda Tan (JIRA) Thu, 16 Jun 2016 22:13:08 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335418#comment-15335418
 ]


Wangda Tan commented on YARN-4844:
----------------------------------

[~kasha],

bq. getMemory is deprecated, but getVirtualCores is not
The reason why only update getMemory is it is the real problem. In the near 
future, virtualCores is not likely go beyond max value of int. Considering size 
of the patch, I only updated getMemory.

bq. getMemory is deprecated and recommends using getMemorySize, but 
getMemorySize is unstable. Seems like the users are stuck between rock and a 
hard place?
I was thinking this is the first release of the new API, probably we could 
update it. I'm open update it to Evolving or even Stable API if you think it is 
required.

bq. Is the recommendation to use the long version for everything - individual 
resource-requests and variables that are used to capture aggregates? If yes, 
shouldn't we update all current usages to the long version?
I've tried updated most of them, except few APIs (like mapreduce.JobStatus), 
getMemory is used by YARN/MR 1k+ times, I believe there're missed places. I can 
address them before release of 2.8.

bq. Also, do you think we can get this in 2.9 instead so we can be sure other 
stuff doesn't break?
I would prefer to leave it in 2.8, this is the real problem that we saw a 
couple of cases, and basically client can do nothing except restart services. 
I've tried to build several YARN downstream projects such as Spark/Slider/Tez 
against this patch, all of them can be built with the api fixes: 
https://issues.apache.org/jira/secure/attachment/12810580/YARN-4844-branch-2.8.addendum.2.patch
Considering there're still 15+ pending blockers and critical issues for 2.8, 
there're at least few weeks to finish 2.8, we can test more downstream projects 
if you want.

bq. Also, noticed that some of the helper methods in Resources seem to using 
getMemorySize for calculations but typecasting to int as in this example:
I will double check them as well as issues you found at YARN-5077. I plan to 
create a new JIRA to address these issues instead of overloading this one.



> Add getMemorySize/getVirtualCoresSize to o.a.h.y.api.records.Resource
> ---------------------------------------------------------------------
>
>                 Key: YARN-4844
>                 URL: https://issues.apache.org/jira/browse/YARN-4844
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Blocker
>             Fix For: 2.8.0
>
>         Attachments: YARN-4844-branch-2.8.0016_.patch, 
> YARN-4844-branch-2.8.addendum.2.patch, YARN-4844-branch-2.addendum.1_.patch, 
> YARN-4844-branch-2.addendum.2.patch, YARN-4844.1.patch, YARN-4844.10.patch, 
> YARN-4844.11.patch, YARN-4844.12.patch, YARN-4844.13.patch, 
> YARN-4844.14.patch, YARN-4844.15.patch, YARN-4844.16.branch-2.patch, 
> YARN-4844.16.patch, YARN-4844.2.patch, YARN-4844.3.patch, YARN-4844.4.patch, 
> YARN-4844.5.patch, YARN-4844.6.patch, YARN-4844.7.patch, 
> YARN-4844.8.branch-2.patch, YARN-4844.8.patch, YARN-4844.9.branch, 
> YARN-4844.9.branch-2.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending 
> resources of running apps to cluster's total pending resources. If a 
> problematic app requires too much resources (let's say 1M+ containers, each 
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that 
> there're many running apps, each of them has capped but still significant 
> numbers of pending resources.
> So we may possibly need to add getMemoryLong/getVirtualCoreLong to 
> o.a.h.y.api.records.Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-4844) Add getMemorySize/getVirtualCoresSize to o.a.h.y.api.records.Resource

Reply via email to