[ 
https://issues.apache.org/jira/browse/YARN-11964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085130#comment-18085130
 ] 

ASF GitHub Bot commented on YARN-11964:
---------------------------------------

ryukobayashi opened a new pull request, #8523:
URL: https://github.com/apache/hadoop/pull/8523

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   YARN RM can transiently report negative available resources (e.g. 
cluster_capacity - allocated goes negative due to overload or node failures). 
`DefaultAMSProcessor` now applies `Resources.componentwiseMax` with 
`Resources.none()` before setting available resources in `AllocateResponse`, 
ensuring the AM always receives a non-negative value.
   
   This is a re-implementation of YARN-11964 (reverted in #8519). The previous 
fix applied the clamp inside `Resource.castToIntSafely()`, which was too broad 
and caused a regression in `TestRLESparseResourceAllocation`. This fix applies 
the clamp at the correct layer — in `DefaultAMSProcessor` where the 
`AllocateResponse` is built.
   
   Contains content generated by Claude (Anthropic)
   
   ### How was this patch tested?
   
   - Added `TestDefaultAMSProcessor#testAvailableResourcesClampedToNonNegative` 
which uses a custom scheduler that returns a negative resource limit and 
verifies the `AllocateResponse` always contains non-negative available 
resources.
   - Existing tests pass: `TestApplicationMasterServiceCapacity`, 
`TestApplicationMasterServiceFair`, `TestApplicationMasterServiceInterceptor`, 
`TestRLESparseResourceAllocation`.
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [x] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [x] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   ### AI Tooling
   
   - [x] The PR includes the phrase "Contains content generated by Claude 
(Anthropic)"
   - [x] My use of AI contributions follows the ASF legal policy
         https://www.apache.org/legal/generative-tooling.html




> Resource.castToIntSafely() should clamp negative values to 0 to prevent 
> propagation of invalid resource counts
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-11964
>                 URL: https://issues.apache.org/jira/browse/YARN-11964
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.4.3
>            Reporter: Ryu Kobayashi
>            Assignee: Ryu Kobayashi
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.5.1, 3.6.0
>
>
>   h2. Problem
>   Resource.castToIntSafely() clamps values exceeding Integer.MAX_VALUE to 
> Integer.MAX_VALUE,
>   but silently passes through negative values. The method comment states
>   "This method assumes resource value is positive", however this assumption
>   is not guaranteed in practice.
>   When YARN RM temporarily reports negative available resources
>   (e.g. due to overload, node failures, or transient resource calculation 
> errors),
>   the negative value is propagated as-is to callers.
>   h2. Root Cause
>   The method only guards against positive overflow:
>   {code:java}
>   protected static int castToIntSafely(long value) {
>       if (value > Integer.MAX_VALUE) {
>         return Integer.MAX_VALUE;
>       }
>       return Long.valueOf(value).intValue();
>   }
>   {code}
>   There is no guard for negative values. When a negative long is passed,
>   it is returned as a negative int, which can cause unexpected behavior
>   in downstream components that assume resource values are non-negative.
>   h2. Impact
>   Downstream components that rely on this method receiving a non-negative int
>   may compute invalid results (e.g. negative slot counts, illegal collection 
> sizes)
>   when YARN temporarily reports negative available resources.
>   h2. Fix
>   Return 0 when value < 0, consistent with the existing behavior of
>   clamping out-of-range values to a safe boundary:
>   {code:java}
>   protected static int castToIntSafely(long value) {
>       if (value < 0) {
>         return 0;
>       }
>       if (value > Integer.MAX_VALUE) {
>         return Integer.MAX_VALUE;
>       }
>       return Long.valueOf(value).intValue();
>   }
>   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to