[
https://issues.apache.org/jira/browse/YARN-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085130#comment-18085130
]
ASF GitHub Bot commented on YARN-11964:
---------------------------------------
ryukobayashi opened a new pull request, #8523:
URL: https://github.com/apache/hadoop/pull/8523
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
YARN RM can transiently report negative available resources (e.g.
cluster_capacity - allocated goes negative due to overload or node failures).
`DefaultAMSProcessor` now applies `Resources.componentwiseMax` with
`Resources.none()` before setting available resources in `AllocateResponse`,
ensuring the AM always receives a non-negative value.
This is a re-implementation of YARN-11964 (reverted in #8519). The previous
fix applied the clamp inside `Resource.castToIntSafely()`, which was too broad
and caused a regression in `TestRLESparseResourceAllocation`. This fix applies
the clamp at the correct layer — in `DefaultAMSProcessor` where the
`AllocateResponse` is built.
Contains content generated by Claude (Anthropic)
### How was this patch tested?
- Added `TestDefaultAMSProcessor#testAvailableResourcesClampedToNonNegative`
which uses a custom scheduler that returns a negative resource limit and
verifies the `AllocateResponse` always contains non-negative available
resources.
- Existing tests pass: `TestApplicationMasterServiceCapacity`,
`TestApplicationMasterServiceFair`, `TestApplicationMasterServiceInterceptor`,
`TestRLESparseResourceAllocation`.
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [x] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [x] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
### AI Tooling
- [x] The PR includes the phrase "Contains content generated by Claude
(Anthropic)"
- [x] My use of AI contributions follows the ASF legal policy
https://www.apache.org/legal/generative-tooling.html
> Resource.castToIntSafely() should clamp negative values to 0 to prevent
> propagation of invalid resource counts
> --------------------------------------------------------------------------------------------------------------
>
> Key: YARN-11964
> URL: https://issues.apache.org/jira/browse/YARN-11964
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.4.3
> Reporter: Ryu Kobayashi
> Assignee: Ryu Kobayashi
> Priority: Minor
> Labels: pull-request-available
> Fix For: 3.5.1, 3.6.0
>
>
> h2. Problem
> Resource.castToIntSafely() clamps values exceeding Integer.MAX_VALUE to
> Integer.MAX_VALUE,
> but silently passes through negative values. The method comment states
> "This method assumes resource value is positive", however this assumption
> is not guaranteed in practice.
> When YARN RM temporarily reports negative available resources
> (e.g. due to overload, node failures, or transient resource calculation
> errors),
> the negative value is propagated as-is to callers.
> h2. Root Cause
> The method only guards against positive overflow:
> {code:java}
> protected static int castToIntSafely(long value) {
> if (value > Integer.MAX_VALUE) {
> return Integer.MAX_VALUE;
> }
> return Long.valueOf(value).intValue();
> }
> {code}
> There is no guard for negative values. When a negative long is passed,
> it is returned as a negative int, which can cause unexpected behavior
> in downstream components that assume resource values are non-negative.
> h2. Impact
> Downstream components that rely on this method receiving a non-negative int
> may compute invalid results (e.g. negative slot counts, illegal collection
> sizes)
> when YARN temporarily reports negative available resources.
> h2. Fix
> Return 0 when value < 0, consistent with the existing behavior of
> clamping out-of-range values to a safe boundary:
> {code:java}
> protected static int castToIntSafely(long value) {
> if (value < 0) {
> return 0;
> }
> if (value > Integer.MAX_VALUE) {
> return Integer.MAX_VALUE;
> }
> return Long.valueOf(value).intValue();
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]