[jira] [Commented] (YARN-11663) [Federation] Add Cache Entity Nums Limit.
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830186#comment-17830186 ] ASF GitHub Bot commented on YARN-11663: --- slfan1989 commented on code in PR #6662: URL: https://github.com/apache/hadoop/pull/6662#discussion_r1536710928 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/cache/FederationGuavaCache.java: ## @@ -60,7 +63,7 @@ public void initCache(Configuration pConf, FederationStateStore pStateStore) { // Initialize Cache. cache = CacheBuilder.newBuilder().expireAfterWrite(cacheTimeToLive, -TimeUnit.MILLISECONDS).build(); +TimeUnit.MILLISECONDS).maximumSize(cacheEntityNums).build(); Review Comment: Thanks for your suggestion! I will fix it. > [Federation] Add Cache Entity Nums Limit. > - > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Labels: pull-request-available > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11663) [Federation] Add Cache Entity Nums Limit.
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830106#comment-17830106 ] ASF GitHub Bot commented on YARN-11663: --- hadoop-yetus commented on PR #6662: URL: https://github.com/apache/hadoop/pull/6662#issuecomment-2016545141 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 21s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 13m 59s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 19m 50s | | trunk passed | | +1 :green_heart: | compile | 8m 57s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 8m 9s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 2m 7s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 44s | | trunk passed | | +1 :green_heart: | javadoc | 1m 38s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 30s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +0 :ok: | spotbugs | 0m 32s | | branch/hadoop-project no spotbugs output file (spotbugsXml.xml) | | +1 :green_heart: | shadedclient | 20m 32s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 35s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 0m 50s | | the patch passed | | +1 :green_heart: | compile | 8m 38s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 8m 38s | | the patch passed | | +1 :green_heart: | compile | 8m 8s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 8m 8s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 2m 3s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6662/1/artifact/out/results-checkstyle-root.txt) | root: The patch generated 2 new + 164 unchanged - 0 fixed = 166 total (was 164) | | +1 :green_heart: | mvnsite | 1m 45s | | the patch passed | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 36s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +0 :ok: | spotbugs | 0m 26s | | hadoop-project has no data from spotbugs | | -1 :x: | shadedclient | 21m 4s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 0m 26s | | hadoop-project in the patch passed. | | -1 :x: | unit | 0m 49s | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6662/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt) | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 2m 52s | | hadoop-yarn-server-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 42s | | The patch does not generate ASF License warnings. | | | | 140m 5s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6662/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6662 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle | | uname | Linux 475c51c6156e 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64
[jira] [Commented] (YARN-11663) [Federation] Add Cache Entity Nums Limit.
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830088#comment-17830088 ] ASF GitHub Bot commented on YARN-11663: --- luoyuan3471 commented on code in PR #6662: URL: https://github.com/apache/hadoop/pull/6662#discussion_r1536644135 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/cache/FederationGuavaCache.java: ## @@ -60,7 +63,7 @@ public void initCache(Configuration pConf, FederationStateStore pStateStore) { // Initialize Cache. cache = CacheBuilder.newBuilder().expireAfterWrite(cacheTimeToLive, -TimeUnit.MILLISECONDS).build(); +TimeUnit.MILLISECONDS).maximumSize(cacheEntityNums).build(); Review Comment: cacheTimeToLive = pConf.getInt(YarnConfiguration.FEDERATION_CACHE_TIME_TO_LIVE_SECS, YarnConfiguration.DEFAULT_FEDERATION_CACHE_TIME_TO_LIVE_SECS); TimeUnit.MILLISECONDS -> TimeUnit.SECONDS > [Federation] Add Cache Entity Nums Limit. > - > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Labels: pull-request-available > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) [Federation] Add Cache Entity Nums Limit.
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11663: -- Issue Type: Improvement (was: Bug) > [Federation] Add Cache Entity Nums Limit. > - > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Labels: pull-request-available > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) [Federation] Add Cache Entity Nums Limit.
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11663: -- Summary: [Federation] Add Cache Entity Nums Limit. (was: Router cache expansion issue) > [Federation] Add Cache Entity Nums Limit. > - > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Labels: pull-request-available > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11663: -- Labels: pull-request-available (was: ) > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Labels: pull-request-available > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830084#comment-17830084 ] ASF GitHub Bot commented on YARN-11663: --- slfan1989 opened a new pull request, #6662: URL: https://github.com/apache/hadoop/pull/6662 ### Description of PR JIRA: YARN-11663. [Federation] Add Cache Entity Nums Limit. ### How was this patch tested? ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11387) [GPG] YARN GPG mistakenly deleted applicationid
[ https://issues.apache.org/jira/browse/YARN-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830050#comment-17830050 ] ASF GitHub Bot commented on YARN-11387: --- hadoop-yetus commented on PR #6660: URL: https://github.com/apache/hadoop/pull/6660#issuecomment-2016433769 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 18m 5s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 43m 38s | | trunk passed | | +1 :green_heart: | compile | 0m 27s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 0m 26s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 0m 27s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 32s | | trunk passed | | +1 :green_heart: | javadoc | 0m 34s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 28s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 0m 47s | | trunk passed | | +1 :green_heart: | shadedclient | 33m 3s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 20s | | the patch passed | | +1 :green_heart: | compile | 0m 20s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 0m 20s | | the patch passed | | +1 :green_heart: | compile | 0m 18s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 0m 18s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 14s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 20s | | the patch passed | | +1 :green_heart: | javadoc | 0m 20s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 19s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 0m 46s | | the patch passed | | +1 :green_heart: | shadedclient | 32m 50s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 1s | | hadoop-yarn-server-globalpolicygenerator in the patch passed. | | +1 :green_heart: | asflicense | 0m 37s | | The patch does not generate ASF License warnings. | | | | 140m 49s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6660/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6660 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 45f6a5950e77 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 2b2084718031bda6966917176f9b171356cbf459 | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6660/1/testReport/ | | Max. process+thread count | 558 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6660/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by |
[jira] [Commented] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.
[ https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830047#comment-17830047 ] zeekling commented on YARN-2024: I have the same proplem in Hadoop 3.1.1 !image-2024-03-23-17-22-00-057.png! > IOException in AppLogAggregatorImpl does not give stacktrace and leaves > aggregated TFile in a bad state. > > > Key: YARN-2024 > URL: https://issues.apache.org/jira/browse/YARN-2024 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Affects Versions: 0.23.10, 2.4.0 >Reporter: Eric Payne >Assignee: Xuan Gong >Priority: Major > > Multiple issues were encountered when AppLogAggregatorImpl encountered an > IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating > yarn-logs for an application that had very large (>150G each) error logs. > - An IOException was encountered during the LogWriter#append call, and a > message was printed, but no stacktrace was provided. Message: "ERROR: > Couldn't upload logs for container_n_nnn_nn_nn. Skipping > this container." > - After the IOExceptin, the TFile is in a bad state, so subsequent calls to > LogWriter#append fail with the following stacktrace: > 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[LogAggregationService #17907,5,main] threw an Exception. > java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE > at > org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164) > ... > - At this point, the yarn-logs cleaner still thinks the thread is > aggregating, so the huge yarn-logs never get cleaned up for that application. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.
[ https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830047#comment-17830047 ] zeekling edited comment on YARN-2024 at 3/23/24 9:23 AM: - I have the same proplem in Hadoop 3.1.1 2024-02-17 01:09:21,112 | INFO | SchedulerEventDispatcher:Event Processor | container_e65_1707884856539_27553_01_66 Container Transitioned from NEW to COMPLETED | RMContainerImpl.java:480 2024-02-17 01:09:21,112 | FATAL | RM ApplicationHistory dispatcher | Error in dispatcher thread | AsyncDispatcher.java:233 java.lang.IllegalStateException: Incorrect state to start a new key: END_KEY at org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:530) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.writeHistoryData(FileSystemApplicationHistoryStore.java:756) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.containerStarted(FileSystemApplicationHistoryStore.java:523) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:198) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:304) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:299) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:227) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:144) at java.lang.Thread.run(Thread.java:748) was (Author: JIRAUSER299659): I have the same proplem in Hadoop 3.1.1 2024-02-17 01:09:21,112 | INFO | SchedulerEventDispatcher:Event Processor | container_e65_1707884856539_27553_01_66 Container Transitioned from NEW to COMPLETED | RMContainerImpl.java:480 2024-02-17 01:09:21,112 | FATAL | RM ApplicationHistory dispatcher | Error in dispatcher thread | AsyncDispatcher.java:233 java.lang.IllegalStateException: Incorrect state to start a new key: END_KEY at org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:530) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.writeHistoryData(FileSystemApplicationHistoryStore.java:756) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.containerStarted(FileSystemApplicationHistoryStore.java:523) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:198) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:304) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:299) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:227) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:144) at java.lang.Thread.run(Thread.java:748) > IOException in AppLogAggregatorImpl does not give stacktrace and leaves > aggregated TFile in a bad state. > > > Key: YARN-2024 > URL: https://issues.apache.org/jira/browse/YARN-2024 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Affects Versions: 0.23.10, 2.4.0 >Reporter: Eric Payne >Assignee: Xuan Gong >Priority: Major > > Multiple issues were encountered when AppLogAggregatorImpl encountered an > IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating > yarn-logs for an application that had very large (>150G each) error logs. > - An IOException was encountered during the LogWriter#append call, and a > message was printed, but no stacktrace was provided. Message: "ERROR: > Couldn't upload logs for container_n_nnn_nn_nn. Skipping > this container." > - After the IOExceptin, the TFile is in a bad state, so subsequent calls to > LogWriter#append fail with the following stacktrace: > 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[LogAggregationService #17907,5,main] threw an Exception. > java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE > at > org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528) > at >
[jira] [Comment Edited] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.
[ https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830047#comment-17830047 ] zeekling edited comment on YARN-2024 at 3/23/24 9:23 AM: - I have the same proplem in Hadoop 3.1.1 2024-02-17 01:09:21,112 | INFO | SchedulerEventDispatcher:Event Processor | container_e65_1707884856539_27553_01_66 Container Transitioned from NEW to COMPLETED | RMContainerImpl.java:480 2024-02-17 01:09:21,112 | FATAL | RM ApplicationHistory dispatcher | Error in dispatcher thread | AsyncDispatcher.java:233 java.lang.IllegalStateException: Incorrect state to start a new key: END_KEY at org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:530) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.writeHistoryData(FileSystemApplicationHistoryStore.java:756) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.containerStarted(FileSystemApplicationHistoryStore.java:523) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:198) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:304) at org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:299) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:227) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:144) at java.lang.Thread.run(Thread.java:748) was (Author: JIRAUSER299659): I have the same proplem in Hadoop 3.1.1 !image-2024-03-23-17-22-00-057.png! > IOException in AppLogAggregatorImpl does not give stacktrace and leaves > aggregated TFile in a bad state. > > > Key: YARN-2024 > URL: https://issues.apache.org/jira/browse/YARN-2024 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Affects Versions: 0.23.10, 2.4.0 >Reporter: Eric Payne >Assignee: Xuan Gong >Priority: Major > > Multiple issues were encountered when AppLogAggregatorImpl encountered an > IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating > yarn-logs for an application that had very large (>150G each) error logs. > - An IOException was encountered during the LogWriter#append call, and a > message was printed, but no stacktrace was provided. Message: "ERROR: > Couldn't upload logs for container_n_nnn_nn_nn. Skipping > this container." > - After the IOExceptin, the TFile is in a bad state, so subsequent calls to > LogWriter#append fail with the following stacktrace: > 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[LogAggregationService #17907,5,main] threw an Exception. > java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE > at > org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164) > ... > - At this point, the yarn-logs cleaner still thinks the thread is > aggregating, so the huge yarn-logs never get cleaned up for that application. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11387) [GPG] YARN GPG mistakenly deleted applicationid
[ https://issues.apache.org/jira/browse/YARN-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830035#comment-17830035 ] ASF GitHub Bot commented on YARN-11387: --- slfan1989 opened a new pull request, #6660: URL: https://github.com/apache/hadoop/pull/6660 ### Description of PR JIRA: YARN-11387. [GPG] YARN GPG mistakenly deleted applicationid. ### How was this patch tested? ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > [GPG] YARN GPG mistakenly deleted applicationid > --- > > Key: YARN-11387 > URL: https://issues.apache.org/jira/browse/YARN-11387 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.2.1, 3.4.0 >Reporter: zhangjunj >Assignee: Shilun Fan >Priority: Major > Labels: federation, gpg, pull-request-available > Attachments: YARN-11387-YARN-11387.v1.patch, > yarn-gpg-mistakenly-deleted-applicationid.png > > Original Estimate: 168h > Remaining Estimate: 168h > > In [YARN-7599|https://issues.apache.org/jira/browse/YARN-7599], the > Federation can delete expired applicationid, but YARN GPG uses getRouter() > method to obtain application information for multiple clusters. If there are > too many applicationids that more than 200,000 , it will not be possible to > pull all the applicationid information at one time, resulting in the > possibility of accidental deletion. The following error is reported for spark > component. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org