[jira] [Commented] (YARN-11663) [Federation] Add Cache Entity Nums Limit.

2024-03-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830186#comment-17830186
 ] 

ASF GitHub Bot commented on YARN-11663:
---

slfan1989 commented on code in PR #6662:
URL: https://github.com/apache/hadoop/pull/6662#discussion_r1536710928


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/cache/FederationGuavaCache.java:
##
@@ -60,7 +63,7 @@ public void initCache(Configuration pConf, 
FederationStateStore pStateStore) {
 
 // Initialize Cache.
 cache = CacheBuilder.newBuilder().expireAfterWrite(cacheTimeToLive,
-TimeUnit.MILLISECONDS).build();
+TimeUnit.MILLISECONDS).maximumSize(cacheEntityNums).build();

Review Comment:
   Thanks for your suggestion! I will fix it.





> [Federation] Add Cache Entity Nums Limit.
> -
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11663) [Federation] Add Cache Entity Nums Limit.

2024-03-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830106#comment-17830106
 ] 

ASF GitHub Bot commented on YARN-11663:
---

hadoop-yetus commented on PR #6662:
URL: https://github.com/apache/hadoop/pull/6662#issuecomment-2016545141

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 59s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  19m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   8m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   8m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   2m  7s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 32s |  |  branch/hadoop-project no spotbugs 
output file (spotbugsXml.xml)  |
   | +1 :green_heart: |  shadedclient  |  20m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 35s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   8m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   8m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   2m  3s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6662/1/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 2 new + 164 unchanged - 0 fixed = 166 total (was 
164)  |
   | +1 :green_heart: |  mvnsite  |   1m 45s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 26s |  |  hadoop-project has no data from 
spotbugs  |
   | -1 :x: |  shadedclient  |  21m  4s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 26s |  |  hadoop-project in the patch 
passed.  |
   | -1 :x: |  unit  |   0m 49s | 
[/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6662/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt)
 |  hadoop-yarn-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 52s |  |  hadoop-yarn-server-common in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 140m  5s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6662/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6662 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle |
   | uname | Linux 475c51c6156e 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 

[jira] [Commented] (YARN-11663) [Federation] Add Cache Entity Nums Limit.

2024-03-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830088#comment-17830088
 ] 

ASF GitHub Bot commented on YARN-11663:
---

luoyuan3471 commented on code in PR #6662:
URL: https://github.com/apache/hadoop/pull/6662#discussion_r1536644135


##
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/cache/FederationGuavaCache.java:
##
@@ -60,7 +63,7 @@ public void initCache(Configuration pConf, 
FederationStateStore pStateStore) {
 
 // Initialize Cache.
 cache = CacheBuilder.newBuilder().expireAfterWrite(cacheTimeToLive,
-TimeUnit.MILLISECONDS).build();
+TimeUnit.MILLISECONDS).maximumSize(cacheEntityNums).build();

Review Comment:
cacheTimeToLive = 
pConf.getInt(YarnConfiguration.FEDERATION_CACHE_TIME_TO_LIVE_SECS,
   YarnConfiguration.DEFAULT_FEDERATION_CACHE_TIME_TO_LIVE_SECS);
   
   TimeUnit.MILLISECONDS -> TimeUnit.SECONDS





> [Federation] Add Cache Entity Nums Limit.
> -
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11663) [Federation] Add Cache Entity Nums Limit.

2024-03-23 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11663:
--
Issue Type: Improvement  (was: Bug)

> [Federation] Add Cache Entity Nums Limit.
> -
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11663) [Federation] Add Cache Entity Nums Limit.

2024-03-23 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11663:
--
Summary: [Federation] Add Cache Entity Nums Limit.  (was: Router cache 
expansion issue)

> [Federation] Add Cache Entity Nums Limit.
> -
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11663) Router cache expansion issue

2024-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11663:
--
Labels: pull-request-available  (was: )

> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11663) Router cache expansion issue

2024-03-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830084#comment-17830084
 ] 

ASF GitHub Bot commented on YARN-11663:
---

slfan1989 opened a new pull request, #6662:
URL: https://github.com/apache/hadoop/pull/6662

   
   
   ### Description of PR
   
   JIRA: YARN-11663. [Federation] Add Cache Entity Nums Limit.
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11387) [GPG] YARN GPG mistakenly deleted applicationid

2024-03-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830050#comment-17830050
 ] 

ASF GitHub Bot commented on YARN-11387:
---

hadoop-yetus commented on PR #6660:
URL: https://github.com/apache/hadoop/pull/6660#issuecomment-2016433769

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  18m  5s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  43m 38s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  33m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 14s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  32m 50s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  1s |  |  
hadoop-yarn-server-globalpolicygenerator in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 37s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 140m 49s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6660/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6660 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 45f6a5950e77 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2b2084718031bda6966917176f9b171356cbf459 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6660/1/testReport/ |
   | Max. process+thread count | 558 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator
 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6660/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | 

[jira] [Commented] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2024-03-23 Thread zeekling (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830047#comment-17830047
 ] 

zeekling commented on YARN-2024:


I have the same proplem in Hadoop 3.1.1

 

!image-2024-03-23-17-22-00-057.png!

 

 

 

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>Priority: Major
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2024-03-23 Thread zeekling (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830047#comment-17830047
 ] 

zeekling edited comment on YARN-2024 at 3/23/24 9:23 AM:
-

I have the same proplem in Hadoop 3.1.1

 

2024-02-17 01:09:21,112 | INFO  | SchedulerEventDispatcher:Event Processor | 
container_e65_1707884856539_27553_01_66 Container Transitioned from NEW to 
COMPLETED | RMContainerImpl.java:480
2024-02-17 01:09:21,112 | FATAL | RM ApplicationHistory dispatcher | Error in 
dispatcher thread | AsyncDispatcher.java:233
java.lang.IllegalStateException: Incorrect state to start a new key: END_KEY
    at 
org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:530)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.writeHistoryData(FileSystemApplicationHistoryStore.java:756)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.containerStarted(FileSystemApplicationHistoryStore.java:523)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:198)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:304)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:299)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:227)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:144)
    at java.lang.Thread.run(Thread.java:748)

 


was (Author: JIRAUSER299659):
I have the same proplem in Hadoop 3.1.1

 

2024-02-17 01:09:21,112 | INFO  | SchedulerEventDispatcher:Event Processor | 
container_e65_1707884856539_27553_01_66 Container Transitioned from NEW to 
COMPLETED | RMContainerImpl.java:480
2024-02-17 01:09:21,112 | FATAL | RM ApplicationHistory dispatcher | Error in 
dispatcher thread | AsyncDispatcher.java:233
java.lang.IllegalStateException: Incorrect state to start a new key: END_KEY
    at 
org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:530)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.writeHistoryData(FileSystemApplicationHistoryStore.java:756)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.containerStarted(FileSystemApplicationHistoryStore.java:523)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:198)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:304)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:299)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:227)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:144)
    at java.lang.Thread.run(Thread.java:748)

 

 

 

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>Priority: Major
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> 

[jira] [Comment Edited] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2024-03-23 Thread zeekling (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830047#comment-17830047
 ] 

zeekling edited comment on YARN-2024 at 3/23/24 9:23 AM:
-

I have the same proplem in Hadoop 3.1.1

 

2024-02-17 01:09:21,112 | INFO  | SchedulerEventDispatcher:Event Processor | 
container_e65_1707884856539_27553_01_66 Container Transitioned from NEW to 
COMPLETED | RMContainerImpl.java:480
2024-02-17 01:09:21,112 | FATAL | RM ApplicationHistory dispatcher | Error in 
dispatcher thread | AsyncDispatcher.java:233
java.lang.IllegalStateException: Incorrect state to start a new key: END_KEY
    at 
org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:530)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.writeHistoryData(FileSystemApplicationHistoryStore.java:756)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.containerStarted(FileSystemApplicationHistoryStore.java:523)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:198)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:304)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:299)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:227)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:144)
    at java.lang.Thread.run(Thread.java:748)

 

 

 


was (Author: JIRAUSER299659):
I have the same proplem in Hadoop 3.1.1

 

!image-2024-03-23-17-22-00-057.png!

 

 

 

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>Priority: Major
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11387) [GPG] YARN GPG mistakenly deleted applicationid

2024-03-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830035#comment-17830035
 ] 

ASF GitHub Bot commented on YARN-11387:
---

slfan1989 opened a new pull request, #6660:
URL: https://github.com/apache/hadoop/pull/6660

   
   
   ### Description of PR
   
   JIRA: YARN-11387. [GPG] YARN GPG mistakenly deleted applicationid.
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> [GPG] YARN GPG mistakenly deleted applicationid
> ---
>
> Key: YARN-11387
> URL: https://issues.apache.org/jira/browse/YARN-11387
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.2.1, 3.4.0
>Reporter: zhangjunj
>Assignee: Shilun Fan
>Priority: Major
>  Labels: federation, gpg, pull-request-available
> Attachments: YARN-11387-YARN-11387.v1.patch, 
> yarn-gpg-mistakenly-deleted-applicationid.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In [YARN-7599|https://issues.apache.org/jira/browse/YARN-7599], the 
> Federation can delete expired applicationid, but  YARN GPG uses getRouter() 
> method to obtain application information for multiple clusters. If there are 
> too many applicationids that more than 200,000 , it will not be possible to 
> pull all the applicationid information at one time, resulting in the 
> possibility of accidental deletion. The following error is reported for spark 
> component.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org