[jira] [Updated] (YARN-11664) Remove HDFS Binaries/Jars Dependency From YARN

2024-03-20 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated YARN-11664:
-
Description: 
In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on *hadoop-common* module,

* Constants variables and Utility classes can be moved to *hadoop-common*
* Instead of DSQuotaExceededException, Use the parent exception 
ClusterStoragrCapacityExceeded

  was:
In principle Hadoop Yarn is independent of HDFS. It can work with any 
filesystem. Currently there exists some code dependency for Yarn with HDFS. 
This dependency requires Yarn to bring in some of the HDFS binaries/jars to its 
class path. The idea behind this jira is to remove this dependency so that Yarn 
can run without HDFS binaries/jars

*Scope*
1. Non test classes are considered
2. Some test classes which comes as transitive dependency are considered


*Out of scope*
1. All test classes in Yarn module is not considered

 




A quick search in Yarn module revealed following HDFS dependencies


1. Constants
{code:java}
import 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
 

 
2. Exception


{code:java}
import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
import org.apache.hadoop.hdfs.protocol.QuotaExceededException;  (Comes as a 
transitive dependency from DSQuotaExceededException){code}
 

3. Utility
{code:java}
import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
 

Both Yarn and HDFS depends on *hadoop-common* module, One straight forward 
approach is to move all these dependencies to *hadoop-common* module and both 
HDFS and Yarn can pick these dependencies.


> Remove HDFS Binaries/Jars Dependency From YARN
> --
>
> Key: YARN-11664
> URL: https://issues.apache.org/jira/browse/YARN-11664
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>
> In principle Hadoop Yarn is independent of HDFS. It can work with any 
> filesystem. Currently there exists some code dependency for Yarn with HDFS. 
> This dependency requires Yarn to bring in some of the HDFS binaries/jars to 
> its class path. The idea behind this jira is to remove this dependency so 
> that Yarn can run without HDFS binaries/jars
> *Scope*
> 1. Non test classes are considered
> 2. Some test classes which comes as transitive dependency are considered
> *Out of scope*
> 1. All test classes in Yarn module is not considered
>  
> 
> A quick search in Yarn module revealed following HDFS dependencies
> 1. Constants
> {code:java}
> import 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier;
> import org.apache.hadoop.hdfs.DFSConfigKeys;{code}
>  
>  
> 2. Exception
> {code:java}
> import org.apache.hadoop.hdfs.protocol.DSQuotaExceededException;
>  
> 3. Utility
> {code:java}
> import org.apache.hadoop.hdfs.protocol.datatransfer.IOStreamPair;{code}
>  
> Both Yarn and HDFS depends on *hadoop-common* module,
> * Constants variables and Utility classes can be moved to *hadoop-common*
> * Instead of DSQuotaExceededException, Use the parent exception 
> ClusterStoragrCapacityExceeded



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (YARN-11667) Federation: ResourceRequestComparator occurs NPE when using low version of hadoop submit application

2024-03-20 Thread qiuliang (Jira)
qiuliang created YARN-11667:
---

 Summary: Federation: ResourceRequestComparator occurs NPE when 
using low version of hadoop submit application
 Key: YARN-11667
 URL: https://issues.apache.org/jira/browse/YARN-11667
 Project: Hadoop YARN
  Issue Type: Bug
  Components: amrmproxy
Affects Versions: 3.4.0
Reporter: qiuliang


When a application is submitted using a lower version of hadoop and the 
Resource Request built by AM has no ExecutionTypeRequest. After the Resource 
Request is submitted to AMRMProxy, the NPE occurs when AMRMProxy reconstructs 
the Allocate Request to add Resource Request to its ask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11664) Remove HDFS Binaries/Jars Dependency From YARN

2024-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829283#comment-17829283
 ] 

ASF GitHub Bot commented on YARN-11664:
---

hadoop-yetus commented on PR #6631:
URL: https://github.com/apache/hadoop/pull/6631#issuecomment-2010451899

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 23s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  31m 49s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 14s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   4m 20s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   6m 59s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   5m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   6m  1s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   2m 54s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6631/4/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs-client in trunk has 1 extant spotbugs 
warnings.  |
   | -1 :x: |  spotbugs  |   1m  9s | 
[/branch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6631/4/artifact/out/branch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core-warnings.html)
 |  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  34m 29s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 31s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 51s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 55s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  16m 55s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 29s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6631/4/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 5 new + 528 unchanged - 2 fixed = 533 total (was 
530)  |
   | +1 :green_heart: |  mvnsite  |   7m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   5m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   6m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |  14m 54s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 50s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 16s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   2m 41s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 225m  0s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   6m  7s |  |  hadoop-yarn-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |  24m 39s |  |  hadoop-yarn-server-nodemanager 
in the patch passed.  |
   | +1 :green_heart: |  unit  |  21m 26s |  |  hadoop-yarn-services-core in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 14s |  

[jira] [Commented] (YARN-11626) Optimization of the safeDelete operation in ZKRMStateStore

2024-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829262#comment-17829262
 ] 

ASF GitHub Bot commented on YARN-11626:
---

hadoop-yetus commented on PR #6616:
URL: https://github.com/apache/hadoop/pull/6616#issuecomment-2010221243

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   6m 33s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m  1s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6616/4/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt)
 |  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 3 new + 5 unchanged - 0 fixed = 8 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  8s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  89m 34s |  |  
hadoop-yarn-server-resourcemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 24s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 179m 40s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6616/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6616 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle |
   | uname | Linux 00b3366602f7 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 725bb7fd54d8c2d821e7b38df2a3358678c71b9c |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 

[jira] [Commented] (YARN-11626) Optimization of the safeDelete operation in ZKRMStateStore

2024-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829195#comment-17829195
 ] 

ASF GitHub Bot commented on YARN-11626:
---

XbaoWu commented on code in PR #6616:
URL: https://github.com/apache/hadoop/pull/6616#discussion_r1532220247


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java:
##
@@ -1441,6 +1441,29 @@ void delete(final String path) throws Exception {
 zkManager.delete(path);
   }
 
+  /**
+   * Deletes the path more safe.
+   * When NNE is encountered, if the node does not exist,

Review Comment:
   > Could you expand NNE in the javadoc for brevity?
   
   Okay, thank you for your reminder





> Optimization of the safeDelete operation in ZKRMStateStore
> --
>
> Key: YARN-11626
> URL: https://issues.apache.org/jira/browse/YARN-11626
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha4, 3.1.1, 3.3.0
>Reporter: wangzhihui
>Priority: Minor
>  Labels: pull-request-available
>
> h1. Description 
>  * We can be observed that removing app info started at 06:17:20, but the 
> NoNodeException was received at 06:17:35. 
>  * During the 15s interval, Curator was retrying the metadata operation. Due 
> to the non-idempotent nature of the Zookeeper deletion operation, in one of 
> the retry attempts, the metadata operation was successful but no response was 
> received. In the next retry it resulted in a NoNodeException, triggering the 
> STATE_STORE_FENCED event and ultimately causing the current ResourceManager 
> to switch to standby .
> {code:java}
> 2023-10-28 06:17:20,359 INFO  recovery.RMStateStore 
> (RMStateStore.java:transition(333)) - Removing info for app: 
> application_1697410508608_140368
> 2023-10-28 06:17:20,359 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:checkAppNumCompletedLimit(303)) - Application should be 
> expired, max number of completed apps kept in memory met: 
> maxCompletedAppsInMemory = 1000, removing app 
> application_1697410508608_140368 from memory:
> 2023-10-28 06:17:35,665 ERROR recovery.RMStateStore 
> (RMStateStore.java:transition(337)) - Error removing app: 
> application_1697410508608_140368
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> 2023-10-28 06:17:35,666 INFO  recovery.RMStateStore 
> (RMStateStore.java:handleStoreEvent(1147)) - RMStateStore state change from 
> ACTIVE to FENCED
> 2023-10-28 06:17:35,666 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:handle(898)) - Received RMFatalEvent of type 
> STATE_STORE_FENCED, caused by 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
> 2023-10-28 06:17:35,666 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:transitionToStandby(1309)) - Transitioning to standby 
> state
>  {code}
> h1. Solution
> The NoNodeException clearly indicates that the Znode no longer exists, so we 
> can safely ignore this exception to avoid triggering a larger impact on the 
> cluster caused by ResourceManager failover.
> h1. Other
> We also need to discuss and optimize the same issues in safeCreate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11626) Optimization of the safeDelete operation in ZKRMStateStore

2024-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829191#comment-17829191
 ] 

ASF GitHub Bot commented on YARN-11626:
---

dineshchitlangia commented on code in PR #6616:
URL: https://github.com/apache/hadoop/pull/6616#discussion_r1532190511


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java:
##
@@ -1441,6 +1441,29 @@ void delete(final String path) throws Exception {
 zkManager.delete(path);
   }
 
+  /**
+   * Deletes the path more safe.
+   * When NNE is encountered, if the node does not exist,

Review Comment:
   Could you expand NNE in the javadoc for brevity?





> Optimization of the safeDelete operation in ZKRMStateStore
> --
>
> Key: YARN-11626
> URL: https://issues.apache.org/jira/browse/YARN-11626
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha4, 3.1.1, 3.3.0
>Reporter: wangzhihui
>Priority: Minor
>  Labels: pull-request-available
>
> h1. Description 
>  * We can be observed that removing app info started at 06:17:20, but the 
> NoNodeException was received at 06:17:35. 
>  * During the 15s interval, Curator was retrying the metadata operation. Due 
> to the non-idempotent nature of the Zookeeper deletion operation, in one of 
> the retry attempts, the metadata operation was successful but no response was 
> received. In the next retry it resulted in a NoNodeException, triggering the 
> STATE_STORE_FENCED event and ultimately causing the current ResourceManager 
> to switch to standby .
> {code:java}
> 2023-10-28 06:17:20,359 INFO  recovery.RMStateStore 
> (RMStateStore.java:transition(333)) - Removing info for app: 
> application_1697410508608_140368
> 2023-10-28 06:17:20,359 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:checkAppNumCompletedLimit(303)) - Application should be 
> expired, max number of completed apps kept in memory met: 
> maxCompletedAppsInMemory = 1000, removing app 
> application_1697410508608_140368 from memory:
> 2023-10-28 06:17:35,665 ERROR recovery.RMStateStore 
> (RMStateStore.java:transition(337)) - Error removing app: 
> application_1697410508608_140368
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> 2023-10-28 06:17:35,666 INFO  recovery.RMStateStore 
> (RMStateStore.java:handleStoreEvent(1147)) - RMStateStore state change from 
> ACTIVE to FENCED
> 2023-10-28 06:17:35,666 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:handle(898)) - Received RMFatalEvent of type 
> STATE_STORE_FENCED, caused by 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
> 2023-10-28 06:17:35,666 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:transitionToStandby(1309)) - Transitioning to standby 
> state
>  {code}
> h1. Solution
> The NoNodeException clearly indicates that the Znode no longer exists, so we 
> can safely ignore this exception to avoid triggering a larger impact on the 
> cluster caused by ResourceManager failover.
> h1. Other
> We also need to discuss and optimize the same issues in safeCreate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5305) Yarn Application Log Aggregation fails due to NM can not get correct HDFS delegation token III

2024-03-20 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke resolved YARN-5305.
-
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Yarn Application Log Aggregation fails due to NM can not get correct HDFS 
> delegation token III
> --
>
> Key: YARN-5305
> URL: https://issues.apache.org/jira/browse/YARN-5305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>Assignee: Peter Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Different with YARN-5098 and YARN-5302, this problem happens when AM submits 
> a startContainer request with a new HDFS token (say, tokenB) which is not 
> managed by YARN, so two tokens exist in the credentials of the user on NM, 
> one is tokenB, the other is the one renewed on RM (tokenA). If tokenB is 
> selected when connect to HDFS and tokenB expires, exception happens.
> Supplementary: this problem happen due to that AM didn't use the service name 
> as the token alias in credentials, so two tokens for the same service can 
> co-exist in one credentials. TokenSelector can only select the first matched 
> token, it doesn't care if the token is valid or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5305) Yarn Application Log Aggregation fails due to NM can not get correct HDFS delegation token III

2024-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829190#comment-17829190
 ] 

ASF GitHub Bot commented on YARN-5305:
--

brumi1024 merged PR #6625:
URL: https://github.com/apache/hadoop/pull/6625




> Yarn Application Log Aggregation fails due to NM can not get correct HDFS 
> delegation token III
> --
>
> Key: YARN-5305
> URL: https://issues.apache.org/jira/browse/YARN-5305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>Assignee: Peter Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Different with YARN-5098 and YARN-5302, this problem happens when AM submits 
> a startContainer request with a new HDFS token (say, tokenB) which is not 
> managed by YARN, so two tokens exist in the credentials of the user on NM, 
> one is tokenB, the other is the one renewed on RM (tokenA). If tokenB is 
> selected when connect to HDFS and tokenB expires, exception happens.
> Supplementary: this problem happen due to that AM didn't use the service name 
> as the token alias in credentials, so two tokens for the same service can 
> co-exist in one credentials. TokenSelector can only select the first matched 
> token, it doesn't care if the token is valid or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5305) Yarn Application Log Aggregation fails due to NM can not get correct HDFS delegation token III

2024-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829189#comment-17829189
 ] 

ASF GitHub Bot commented on YARN-5305:
--

brumi1024 commented on PR #6625:
URL: https://github.com/apache/hadoop/pull/6625#issuecomment-2009719040

   Thanks @p-szucs for the patch, @K0K0V0K for the review. The spotbug warning 
seems unrelated, merging to trunk.




> Yarn Application Log Aggregation fails due to NM can not get correct HDFS 
> delegation token III
> --
>
> Key: YARN-5305
> URL: https://issues.apache.org/jira/browse/YARN-5305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>Assignee: Peter Szucs
>Priority: Major
>  Labels: pull-request-available
>
> Different with YARN-5098 and YARN-5302, this problem happens when AM submits 
> a startContainer request with a new HDFS token (say, tokenB) which is not 
> managed by YARN, so two tokens exist in the credentials of the user on NM, 
> one is tokenB, the other is the one renewed on RM (tokenA). If tokenB is 
> selected when connect to HDFS and tokenB expires, exception happens.
> Supplementary: this problem happen due to that AM didn't use the service name 
> as the token alias in credentials, so two tokens for the same service can 
> co-exist in one credentials. TokenSelector can only select the first matched 
> token, it doesn't care if the token is valid or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org