[jira] [Updated] (YARN-10160) Add auto queue creation related configs to RMWebService#CapacitySchedulerQueueInfo
[ https://issues.apache.org/jira/browse/YARN-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-10160: - Attachment: YARN-10160-006.patch > Add auto queue creation related configs to > RMWebService#CapacitySchedulerQueueInfo > -- > > Key: YARN-10160 > URL: https://issues.apache.org/jira/browse/YARN-10160 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: Screen Shot 2020-02-25 at 9.06.52 PM.png, > YARN-10160-001.patch, YARN-10160-002.patch, YARN-10160-003.patch, > YARN-10160-004.patch, YARN-10160-005.patch, YARN-10160-006.patch > > > Add auto queue creation related configs to > RMWebService#CapacitySchedulerQueueInfo. > {code} > yarn.scheduler.capacity..auto-create-child-queue.enabled > yarn.scheduler.capacity..leaf-queue-template. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10208) Add metric in CapacityScheduler for evaluating the time difference between node heartbeats
[ https://issues.apache.org/jira/browse/YARN-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066611#comment-17066611 ] Pranjal Protim Borah edited comment on YARN-10208 at 3/26/20, 5:34 AM: --- Additional metric to measure time difference between node heartbeats. was (Author: lapjarn): [~bibinchundatt] Jira for metric schedulerHeartBeatIntervalAverage > Add metric in CapacityScheduler for evaluating the time difference between > node heartbeats > -- > > Key: YARN-10208 > URL: https://issues.apache.org/jira/browse/YARN-10208 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Pranjal Protim Borah >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10194) YARN RMWebServices /scheduler-conf/validate leaks ZK Connections
[ https://issues.apache.org/jira/browse/YARN-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-10194: - Attachment: YARN-10194-004.patch > YARN RMWebServices /scheduler-conf/validate leaks ZK Connections > > > Key: YARN-10194 > URL: https://issues.apache.org/jira/browse/YARN-10194 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Akhil PB >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-10194-001.patch, YARN-10194-002.patch, > YARN-10194-003.patch, YARN-10194-004.patch > > > YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Validation > API creates a new CapacityScheduler and missed to close after the validation. > Every CapacityScheduler#init opens MutableCSConfigurationProvider which opens > ZKConfigurationStore and creates a ZK Connection. > *ZK LOGS* > {code} > -03-12 16:45:51,881 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: [2 > times] Error accepting new connection: Too many connections from > /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,449 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,710 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,876 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:53,068 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:53,391 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,008 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,287 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,483 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > {code} > And there is an another bug in ZKConfigurationStore which has not handled > close() of ZKCuratorManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10194) YARN RMWebServices /scheduler-conf/validate leaks ZK Connections
[ https://issues.apache.org/jira/browse/YARN-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067364#comment-17067364 ] Prabhu Joseph commented on YARN-10194: -- [~sunilg] Have attached [^YARN-10194-004.patch] after rebasing. Thanks. > YARN RMWebServices /scheduler-conf/validate leaks ZK Connections > > > Key: YARN-10194 > URL: https://issues.apache.org/jira/browse/YARN-10194 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Akhil PB >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-10194-001.patch, YARN-10194-002.patch, > YARN-10194-003.patch, YARN-10194-004.patch > > > YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Validation > API creates a new CapacityScheduler and missed to close after the validation. > Every CapacityScheduler#init opens MutableCSConfigurationProvider which opens > ZKConfigurationStore and creates a ZK Connection. > *ZK LOGS* > {code} > -03-12 16:45:51,881 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: [2 > times] Error accepting new connection: Too many connections from > /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,449 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,710 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,876 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:53,068 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:53,391 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,008 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,287 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,483 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > {code} > And there is an another bug in ZKConfigurationStore which has not handled > close() of ZKCuratorManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10194) YARN RMWebServices /scheduler-conf/validate leaks ZK Connections
[ https://issues.apache.org/jira/browse/YARN-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067333#comment-17067333 ] Sunil G commented on YARN-10194: [~prabhujoseph] pls rebase to trunk > YARN RMWebServices /scheduler-conf/validate leaks ZK Connections > > > Key: YARN-10194 > URL: https://issues.apache.org/jira/browse/YARN-10194 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Akhil PB >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-10194-001.patch, YARN-10194-002.patch, > YARN-10194-003.patch > > > YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Validation > API creates a new CapacityScheduler and missed to close after the validation. > Every CapacityScheduler#init opens MutableCSConfigurationProvider which opens > ZKConfigurationStore and creates a ZK Connection. > *ZK LOGS* > {code} > -03-12 16:45:51,881 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: [2 > times] Error accepting new connection: Too many connections from > /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,449 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,710 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,876 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:53,068 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:53,391 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,008 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,287 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,483 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > {code} > And there is an another bug in ZKConfigurationStore which has not handled > close() of ZKCuratorManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10194) YARN RMWebServices /scheduler-conf/validate leaks ZK Connections
[ https://issues.apache.org/jira/browse/YARN-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067318#comment-17067318 ] Hadoop QA commented on YARN-10194: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 9s{color} | {color:red} YARN-10194 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-10194 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12997423/YARN-10194-003.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25750/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > YARN RMWebServices /scheduler-conf/validate leaks ZK Connections > > > Key: YARN-10194 > URL: https://issues.apache.org/jira/browse/YARN-10194 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Akhil PB >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-10194-001.patch, YARN-10194-002.patch, > YARN-10194-003.patch > > > YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Validation > API creates a new CapacityScheduler and missed to close after the validation. > Every CapacityScheduler#init opens MutableCSConfigurationProvider which opens > ZKConfigurationStore and creates a ZK Connection. > *ZK LOGS* > {code} > -03-12 16:45:51,881 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: [2 > times] Error accepting new connection: Too many connections from > /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,449 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,710 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:52,876 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:53,068 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:53,391 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,008 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,287 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new > connection: Too many connections from /172.27.99.64 - max is 60 > 2020-03-12 16:45:54,483 WARN > org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting > new connection: Too many connections from /172.27.99.64 - max is 60 > {code} > And there is an another bug in ZKConfigurationStore which has not handled > close() of ZKCuratorManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10211) [YARN UI2] Queue selection is not highlighted on first time in queues page
Akhil PB created YARN-10211: --- Summary: [YARN UI2] Queue selection is not highlighted on first time in queues page Key: YARN-10211 URL: https://issues.apache.org/jira/browse/YARN-10211 Project: Hadoop YARN Issue Type: Bug Reporter: Akhil PB Assignee: Akhil PB -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10200) Add number of containers to RMAppManager summary
[ https://issues.apache.org/jira/browse/YARN-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066941#comment-17066941 ] Hudson commented on YARN-10200: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18089 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18089/]) YARN-10200. Add number of containers to RMAppManager summary (jhung: rev 6ce189c62132706d9aaee5abf020ae4dc783ba26) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestCombinedSystemMetricsPublisher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestAppPage.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisherForV2.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java > Add number of containers to RMAppManager summary > > > Key: YARN-10200 > URL: https://issues.apache.org/jira/browse/YARN-10200 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1 > > Attachments: YARN-10200.001.patch, YARN-10200.002.patch, > YARN-10200.003.patch > > > It would be useful to persist this so we can track containers processed by RM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To
[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements
[ https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066894#comment-17066894 ] Szilard Nemeth commented on YARN-10043: --- Hi [~maniraj...@gmail.com]! Sorry for the late response, I was busy with other things in the last couple of weeks. I can take a look on this tomorrow. Next time if you have anything important like this, please reach out to other committers as well to get feedback more quickly :) > FairOrderingPolicy Improvements > --- > > Key: YARN-10043 > URL: https://issues.apache.org/jira/browse/YARN-10043 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-10043.001.patch, YARN-10043.002.patch, > YARN-10043.003.patch, YARN-10043.004.patch > > > FairOrderingPolicy can be improved by using some of the approaches (only > relevant) implemented in FairSharePolicy of FS. This improvement has > significance in FS to CS migration context. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements
[ https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066864#comment-17066864 ] Manikandan R commented on YARN-10043: - [~snemeth] I am waiting on this. Can we please take it forward? > FairOrderingPolicy Improvements > --- > > Key: YARN-10043 > URL: https://issues.apache.org/jira/browse/YARN-10043 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-10043.001.patch, YARN-10043.002.patch, > YARN-10043.003.patch, YARN-10043.004.patch > > > FairOrderingPolicy can be improved by using some of the approaches (only > relevant) implemented in FairSharePolicy of FS. This improvement has > significance in FS to CS migration context. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources
[ https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066859#comment-17066859 ] Manikandan R commented on YARN-10154: - [~sunilg] Had a chance to review the patch? Thank you. > CS Dynamic Queues cannot be configured with absolute resources > -- > > Key: YARN-10154 > URL: https://issues.apache.org/jira/browse/YARN-10154 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.3 >Reporter: Sunil G >Assignee: Manikandan R >Priority: Major > Attachments: YARN-10154.001.patch, YARN-10154.002.patch > > > In CS, ManagedParent Queue and its template cannot take absolute resource > value like > [memory=8192,vcores=8] > Thsi Jira is to track and improve the configuration reading module of > DynamicQueue to support absolute resource values. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10003) YarnConfigurationStore#checkVersion throws exception that belongs to RMStateStore
[ https://issues.apache.org/jira/browse/YARN-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066762#comment-17066762 ] Hadoop QA commented on YARN-10003: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 3s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 55s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} branch-3.2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}308m 39s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}377m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures | | | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingUnmanagedAM | | | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector | | | hadoop.yarn.server.resourcemanager.placement.TestPlacementManager | | | hadoop.yarn.server.resourcemanager.metrics.TestCombinedSystemMetricsPublisher | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:0f25cbbb251 | | JIRA Issue | YARN-10003 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12997642/YARN-10003.branch-3.2.POC003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a696b1f8944b 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (YARN-10210) Add a RMFailoverProxyProvider that does DNS resolution on failover
[ https://issues.apache.org/jira/browse/YARN-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066746#comment-17066746 ] Íñigo Goiri commented on YARN-10210: HADOOP-16938 is already merged. I also moved this to YARN as this is isolated there. [~roliu] do you mind rebasing the PR? > Add a RMFailoverProxyProvider that does DNS resolution on failover > -- > > Key: YARN-10210 > URL: https://issues.apache.org/jira/browse/YARN-10210 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.2 >Reporter: Roger Liu >Assignee: Roger Liu >Priority: Major > > In Kubernetes, the a node may go down and then come back later with a > different IP address. YARN clients which are already running will be unable > to rediscover the node after it comes back up due to caching the original IP > address. This is problematic for cases such as Spark HA on Kubernetes, as the > node containing the resource manager may go down and come back up, meaning > existing node managers must then also be restarted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10210) Add a RMFailoverProxyProvider that does DNS resolution on failover
[ https://issues.apache.org/jira/browse/YARN-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-10210: --- Summary: Add a RMFailoverProxyProvider that does DNS resolution on failover (was: Cached DNS name resolution error) > Add a RMFailoverProxyProvider that does DNS resolution on failover > -- > > Key: YARN-10210 > URL: https://issues.apache.org/jira/browse/YARN-10210 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.2 >Reporter: Roger Liu >Assignee: Roger Liu >Priority: Major > > In Kubernetes, the a node may go down and then come back later with a > different IP address. YARN clients which are already running will be unable > to rediscover the node after it comes back up due to caching the original IP > address. This is problematic for cases such as Spark HA on Kubernetes, as the > node containing the resource manager may go down and come back up, meaning > existing node managers must then also be restarted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10210) Cached DNS name resolution error
[ https://issues.apache.org/jira/browse/YARN-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-10210: --- Description: In Kubernetes, the a node may go down and then come back later with a different IP address. YARN clients which are already running will be unable to rediscover the node after it comes back up due to caching the original IP address. This is problematic for cases such as Spark HA on Kubernetes, as the node containing the resource manager may go down and come back up, meaning existing node managers must then also be restarted. (was: In Kubernetes, the a node may go down and then come back later with a different IP address. Yarn clients which are already running will be unable to rediscover the node after it comes back up due to caching the original IP address. This is problematic for cases such as Spark HA on Kubernetes, as the node containing the resource manager may go down and come back up, meaning existing node managers must then also be restarted.) > Cached DNS name resolution error > > > Key: YARN-10210 > URL: https://issues.apache.org/jira/browse/YARN-10210 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.2 >Reporter: Roger Liu >Assignee: Roger Liu >Priority: Major > > In Kubernetes, the a node may go down and then come back later with a > different IP address. YARN clients which are already running will be unable > to rediscover the node after it comes back up due to caching the original IP > address. This is problematic for cases such as Spark HA on Kubernetes, as the > node containing the resource manager may go down and come back up, meaning > existing node managers must then also be restarted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10210) Cached DNS name resolution error
[ https://issues.apache.org/jira/browse/YARN-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri reassigned YARN-10210: -- Assignee: Roger Liu > Cached DNS name resolution error > > > Key: YARN-10210 > URL: https://issues.apache.org/jira/browse/YARN-10210 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.2 >Reporter: Roger Liu >Assignee: Roger Liu >Priority: Major > > In Kubernetes, the a node may go down and then come back later with a > different IP address. Yarn clients which are already running will be unable > to rediscover the node after it comes back up due to caching the original IP > address. This is problematic for cases such as Spark HA on Kubernetes, as the > node containing the resource manager may go down and come back up, meaning > existing node managers must then also be restarted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10210) Cached DNS name resolution error
[ https://issues.apache.org/jira/browse/YARN-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri reassigned YARN-10210: -- Key: YARN-10210 (was: HADOOP-16543) Affects Version/s: (was: 3.1.2) 3.1.2 Assignee: (was: Roger Liu) Issue Type: Improvement (was: Bug) Project: Hadoop YARN (was: Hadoop Common) > Cached DNS name resolution error > > > Key: YARN-10210 > URL: https://issues.apache.org/jira/browse/YARN-10210 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.2 >Reporter: Roger Liu >Priority: Major > > In Kubernetes, the a node may go down and then come back later with a > different IP address. Yarn clients which are already running will be unable > to rediscover the node after it comes back up due to caching the original IP > address. This is problematic for cases such as Spark HA on Kubernetes, as the > node containing the resource manager may go down and come back up, meaning > existing node managers must then also be restarted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10209) DistributedShell should initialize TimelineClient conditionally
Benjamin Teke created YARN-10209: Summary: DistributedShell should initialize TimelineClient conditionally Key: YARN-10209 URL: https://issues.apache.org/jira/browse/YARN-10209 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Benjamin Teke YarnConfiguration was changed along with the introduction of newer Timeline Service versions to include configuration about the used version. In Hadoop 2.6.0 the distributed shell instantiates Timeline Client whether if it's enabled in the configuration or not. Running this distributed shell on newer Hadoop versions (where the new Timeline Service is available) causes an exception, because the bundled YarnConfiguration doesn't have the necessary version configuration property. Making the Timeline Client initialization conditional the distributed shell would run at least with disabled Timeline Service. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066628#comment-17066628 ] Hudson commented on YARN-9879: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18085 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18085/]) YARN-9879. Allow multiple leaf queues with the same name in (sunilg: rev cdb2107066a2d8557270888c0a9a75f29a6853bf) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesSchedulerActivities.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesForCSWithPartitions.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerQueueMappingFactory.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/QueueMappingEntity.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimitsByPartition.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestAbsoluteResourceConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerPerf.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoIntraQueuePreemptionPlugin.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/activities/ActivitiesManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/QueuePlacementRuleUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestReservationSystem.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/WorkflowPriorityMappingsManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * (edit)
[jira] [Updated] (YARN-9879) Allow multiple leaf queues with the same name in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-9879: -- Summary: Allow multiple leaf queues with the same name in CapacityScheduler (was: Allow multiple leaf queues with the same name in CS) > Allow multiple leaf queues with the same name in CapacityScheduler > -- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Labels: fs2cs > Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, > YARN-9879.014.patch, YARN-9879.015.patch, YARN-9879.015.patch, > YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, > YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, > YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, > YARN-9879.POC010.patch, YARN-9879.POC011.patch, YARN-9879.POC012.patch, > YARN-9879.POC013.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10200) Add number of containers to RMAppManager summary
[ https://issues.apache.org/jira/browse/YARN-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066615#comment-17066615 ] Adam Antal commented on YARN-10200: --- Reviewed the patch, LGTM (non-binding). > Add number of containers to RMAppManager summary > > > Key: YARN-10200 > URL: https://issues.apache.org/jira/browse/YARN-10200 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-10200.001.patch, YARN-10200.002.patch, > YARN-10200.003.patch > > > It would be useful to persist this so we can track containers processed by RM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10208) Add metric in CapacityScheduler for evaluating the time difference between node heartbeats
[ https://issues.apache.org/jira/browse/YARN-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066611#comment-17066611 ] Pranjal Protim Borah commented on YARN-10208: - [~bibinchundatt] Jira for metric schedulerHeartBeatIntervalAverage > Add metric in CapacityScheduler for evaluating the time difference between > node heartbeats > -- > > Key: YARN-10208 > URL: https://issues.apache.org/jira/browse/YARN-10208 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Pranjal Protim Borah >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10208) Add metric in CapacityScheduler for evaluating the time difference between node heartbeats
[ https://issues.apache.org/jira/browse/YARN-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranjal Protim Borah updated YARN-10208: Summary: Add metric in CapacityScheduler for evaluating the time difference between node heartbeats (was: Add CapacityScheduler metrics for evaluating the time difference between node heartbeats) > Add metric in CapacityScheduler for evaluating the time difference between > node heartbeats > -- > > Key: YARN-10208 > URL: https://issues.apache.org/jira/browse/YARN-10208 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Pranjal Protim Borah >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10208) Add CapacityScheduler metrics for evaluating the time difference between node heartbeats
Pranjal Protim Borah created YARN-10208: --- Summary: Add CapacityScheduler metrics for evaluating the time difference between node heartbeats Key: YARN-10208 URL: https://issues.apache.org/jira/browse/YARN-10208 Project: Hadoop YARN Issue Type: Improvement Reporter: Pranjal Protim Borah -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066607#comment-17066607 ] Sunil G commented on YARN-9879: --- Thanks [~shuzirra] Lets get this in now. +1 to the latest patch. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Labels: fs2cs > Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, > YARN-9879.014.patch, YARN-9879.015.patch, YARN-9879.015.patch, > YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, > YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, > YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, > YARN-9879.POC010.patch, YARN-9879.POC011.patch, YARN-9879.POC012.patch, > YARN-9879.POC013.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066561#comment-17066561 ] Gergely Pollak commented on YARN-9879: -- [~sunilg] yes, that is correct, it is unrelated, and SLS tests are failing quite often with no real reason. But to be sure I've executed the test case a few times manually, and it was passing properly, so this one seems to be a flaky one. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Labels: fs2cs > Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, > YARN-9879.014.patch, YARN-9879.015.patch, YARN-9879.015.patch, > YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, > YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, > YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, > YARN-9879.POC010.patch, YARN-9879.POC011.patch, YARN-9879.POC012.patch, > YARN-9879.POC013.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10003) YarnConfigurationStore#checkVersion throws exception that belongs to RMStateStore
[ https://issues.apache.org/jira/browse/YARN-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-10003: - Attachment: YARN-10003.branch-3.2.POC003.patch > YarnConfigurationStore#checkVersion throws exception that belongs to > RMStateStore > - > > Key: YARN-10003 > URL: https://issues.apache.org/jira/browse/YARN-10003 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-10003.001.patch, YARN-10003.002.patch, > YARN-10003.003.patch, YARN-10003.004.patch, YARN-10003.005.patch, > YARN-10003.branch-3.2.001.patch, YARN-10003.branch-3.2.POC001.patch, > YARN-10003.branch-3.2.POC002.patch, YARN-10003.branch-3.2.POC003.patch > > > RMStateVersionIncompatibleException is thrown from method "checkVersion". > Moreover, there's a TODO here saying this method is copied from RMStateStore. > We should revise this method a bit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9997) Code cleanup in ZKConfigurationStore
[ https://issues.apache.org/jira/browse/YARN-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066471#comment-17066471 ] Andras Gyori commented on YARN-9997: The backport is ready to be merged, however, I am waiting for the update of [YARN-10002|https://issues.apache.org/jira/browse/YARN-10002], whether that is possible to be backported as well. In that case, YARN-10002 should be merged first to avoid conflicts. > Code cleanup in ZKConfigurationStore > > > Key: YARN-9997 > URL: https://issues.apache.org/jira/browse/YARN-9997 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Andras Gyori >Priority: Minor > Fix For: 3.3.0 > > Attachments: YARN-9997.001.patch, YARN-9997.002.patch, > YARN-9997.003.patch, YARN-9997.004.patch, YARN-9997.005.patch, > YARN-9997.006.patch > > > Many thins can be improved: > * znodeParentPath could be a local variable > * zkManager could be private, VisibleForTesting annotation is not needed > anymore > * Do something with unchecked casts > * zkManager.safeSetData calls are almost having the same set of parameters: > Simplify this > * Extract zkManager calls to their own methods: They are repeated > * Remove TODOs -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9354) Resources should be created with ResourceTypesTestHelper instead of TestUtils
[ https://issues.apache.org/jira/browse/YARN-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066468#comment-17066468 ] Andras Gyori commented on YARN-9354: A new patch has been submitted for the branch-3.2 backport, the failing unit tests are unrelated. > Resources should be created with ResourceTypesTestHelper instead of TestUtils > - > > Key: YARN-9354 > URL: https://issues.apache.org/jira/browse/YARN-9354 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Andras Gyori >Priority: Trivial > Labels: newbie, newbie++ > Fix For: 3.3.0 > > Attachments: YARN-9354.001.patch, YARN-9354.002.patch, > YARN-9354.003.patch, YARN-9354.004.patch, YARN-9354.branch-3.2.001.patch, > YARN-9354.branch-3.2.002.patch, YARN-9354.branch-3.2.003.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestUtils#createResource > has not identical, but very similar implementation to > org.apache.hadoop.yarn.resourcetypes.ResourceTypesTestHelper#newResource. > Since these 2 methods are doing the same essentially and > ResourceTypesTestHelper is newer and used more, TestUtils#createResource > should be replaced with ResourceTypesTestHelper#newResource with all > occurrence. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI
[ https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated YARN-10207: --- Description: File descriptor leaks are observed coming from the JobHistoryServer process while it tries to render a "corrupted" aggregated log on the JHS Web UI. Issue reproduced using the following steps: # Ran a sample Hadoop MR Pi job, it had the id - application_1582676649923_0026. # Copied an aggregated log file from HDFS to local FS: {code} hdfs dfs -get /tmp/logs/systest/logs/application_1582676649923_0026/_8041 {code} # Updated the TFile metadata at the bottom of this file with some junk to corrupt the file : *Before:* {code} ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP {code} *After:* {code} ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah {code} Notice "blah" (junk) added at the very end. # Remove the existing aggregated log file that will need to be replaced by our modified copy from step 3 (as otherwise HDFS will prevent it from placing the file with the same name as it already exists): {code} hdfs dfs -rm -r -f /tmp/logs/systest/logs/application_1582676649923_0026/_8041 {code} # Upload the corrupted aggregated file back to HDFS: {code} hdfs dfs -put _8041 /tmp/logs/systest/logs/application_1582676649923_0026 {code} # Visit HistoryServer Web UI # Click on job_1582676649923_0026 # Click on "logs" link against the AM (assuming the AM ran on nm_hostname) # Review the JHS logs, following exception will be seen: {code} 2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error getting logs for job_1582676649923_0026 java.io.IOException: Not a valid BCFile. at org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927) at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628) at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588) at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111) at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341) at org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at
[jira] [Updated] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI
[ https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated YARN-10207: --- Description: Issue reproduced using the following steps: # Ran a sample Hadoop MR Pi job, it had the id - application_1582676649923_0026. # Copied an aggregated log file from HDFS to local FS: {code} hdfs dfs -get /tmp/logs/systest/logs/application_1582676649923_0026/_8041 {code} # Updated the TFile metadata at the bottom of this file with some junk to corrupt the file : *Before:* {code} ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP {code} *After:* {code} ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah {code} Notice "blah" (junk) added at the very end. # Remove the existing aggregated log file that will need to be replaced by our modified copy from step 3 (as otherwise HDFS will prevent it from placing the file with the same name as it already exists): {code} hdfs dfs -rm -r -f /tmp/logs/systest/logs/application_1582676649923_0026/_8041 {code} # Upload the corrupted aggregated file back to HDFS: {code} hdfs dfs -put _8041 /tmp/logs/systest/logs/application_1582676649923_0026 {code} # Visit HistoryServer Web UI # Click on job_1582676649923_0026 # Click on "logs" link against the AM (assuming the AM ran on nm_hostname) # Review the JHS logs, following exception will be seen: {code} 2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error getting logs for job_1582676649923_0026 java.io.IOException: Not a valid BCFile. at org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927) at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628) at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588) at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111) at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341) at org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at
[jira] [Updated] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI
[ https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated YARN-10207: --- Description: Issue reproduced using the following steps: # Ran a sample Hadoop MR Pi job, it had the id - application_1582676649923_0026. # Copied an aggregated log file from HDFS to local FS: {code} hdfs dfs -get /tmp/logs/systest/logs/application_1582676649923_0026/_8041 {code} # Updated the TFile metadata at the bottom of this file with some junk to corrupt the file : *Before:* {code} ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP {code} *After:* {code} ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah {code} Notice "blah" (junk) added at the very end. # Remove the existing aggregated log file that will need to be replaced by our modified copy from step 3 (as otherwise HDFS will prevent it from placing the file with the same name as it already exists): {code} hdfs dfs -rm -r -f /tmp/logs/systest/logs/application_1582676649923_0026/_8041 {code} # Upload the corrupted aggregated file back to HDFS: {code} hdfs dfs -put _8041 /tmp/logs/systest/logs/application_1582676649923_0026 {code} # Visit HistoryServer Web UI # Click on job_1582676649923_0026 # Click on "logs" link against the AM (assuming the AM ran on nm_hostname) # Review the JHS logs, following exception will be seen: {code} 2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error getting logs for job_1582676649923_0026 java.io.IOException: Not a valid BCFile. at org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927) at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628) at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588) at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111) at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341) at org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
[jira] [Updated] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI
[ https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated YARN-10207: --- Description: Issue reproduced using the following steps: # Ran a sample Hadoop MR Pi job, it had the id - application_1582676649923_0026. # Copied an aggregated log file from HDFS to local FS: {code} hdfs dfs -get /tmp/logs/systest/logs/application_1582676649923_0026/_8041 {code} # Updated the TFile metadata at the bottom of this file with some junk to corrupt the file : *Before:* {code} ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP {code} *After:* {code} ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah {code} Notice "blah" added at the very end. # Remove the existing aggregated log file that will need to be replaced by our modified copy from step 3 (as otherwise HDFS will prevent it from placing the file with the same name as it already exists): {code} hdfs dfs -rm -r -f /tmp/logs/systest/logs/application_1582676649923_0026/_8041 {code} # Upload the corrupted aggregated file back to HDFS: {code} hdfs dfs -put _8041 /tmp/logs/systest/logs/application_1582676649923_0026 {code} # Visit HistoryServer Web UI # Click on job_1582676649923_0026 # Click on "logs" link against the AM (assuming the AM ran on nm_hostname) # Review the JHS logs, following exception will be seen: {code} 2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error getting logs for job_1582676649923_0026 java.io.IOException: Not a valid BCFile. at org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927) at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628) at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588) at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111) at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341) at org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
[jira] [Assigned] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI
[ https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja reassigned YARN-10207: -- Assignee: Siddharth Ahuja > CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated > logs on the JobHistoryServer Web UI > - > > Key: YARN-10207 > URL: https://issues.apache.org/jira/browse/YARN-10207 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Major > > Issue reproduced using the following steps: > # Ran a sample Hadoop MR Pi job, it had the id - > application_1582676649923_0026. > # Copied an aggregated log file from HDFS to local FS: > {code} > hdfs dfs -get > /tmp/logs/systest/logs/application_1582676649923_0026/_8041 > {code} > # Updated the TFile metadata at the bottom of this file with some junk to > corrupt the file : > *Before:* > {code} > > ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP > {code} > *After:* > {code} > > ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah > {code} > Notice "blah" added at the very end. > # Remove the existing aggregated log file that will need to be replaced by > our modified copy from step 3 (as otherwise HDFS will prevent it from placing > the file with the same name as it already exists): > {code} > hdfs dfs -rm -r -f > /tmp/logs/systest/logs/application_1582676649923_0026/_8041 > {code} > # Upload the corrupted aggregated file back to HDFS: > {code} > hdfs dfs -put _8041 > /tmp/logs/systest/logs/application_1582676649923_0026 > {code} > # Visit HistoryServer Web UI > # Click on job_1582676649923_0026 > # Click on "logs" link against the AM (assuming the AM ran on nm_hostname) > # Review the JHS logs, following exception will be seen: > {code} > 2020-03-24 20:03:48,484 ERROR > org.apache.hadoop.yarn.webapp.View: Error getting logs for > job_1582676649923_0026 > java.io.IOException: Not a valid BCFile. > at > org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927) > at > org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628) > at > org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341) > at > org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at > org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) > at > org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) > at > org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) > at > org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Created] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI
Siddharth Ahuja created YARN-10207: -- Summary: CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI Key: YARN-10207 URL: https://issues.apache.org/jira/browse/YARN-10207 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Siddharth Ahuja Issue reproduced using the following steps: # Ran a sample Hadoop MR Pi job, it had the id - application_1582676649923_0026. # Copied an aggregated log file from HDFS to local FS: {code} hdfs dfs -get /tmp/logs/systest/logs/application_1582676649923_0026/_8041 {code} # Updated the TFile metadata at the bottom of this file with some junk to corrupt the file : *Before:* {code} ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP {code} *After:* {code} ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah {code} Notice "blah" added at the very end. # Remove the existing aggregated log file that will need to be replaced by our modified copy from step 3 (as otherwise HDFS will prevent it from placing the file with the same name as it already exists): {code} hdfs dfs -rm -r -f /tmp/logs/systest/logs/application_1582676649923_0026/_8041 {code} # Upload the corrupted aggregated file back to HDFS: {code} hdfs dfs -put _8041 /tmp/logs/systest/logs/application_1582676649923_0026 {code} # Visit HistoryServer Web UI # Click on job_1582676649923_0026 # Click on "logs" link against the AM (assuming the AM ran on nm_hostname) # Review the JHS logs, following exception will be seen: {code} 2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error getting logs for job_1582676649923_0026 java.io.IOException: Not a valid BCFile. at org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927) at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628) at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588) at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111) at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341) at org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
[jira] [Commented] (YARN-10160) Add auto queue creation related configs to RMWebService#CapacitySchedulerQueueInfo
[ https://issues.apache.org/jira/browse/YARN-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066406#comment-17066406 ] Hadoop QA commented on YARN-10160: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 20m 53s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 47s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 22s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 18s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 9 new + 78 unchanged - 0 fixed = 87 total (was 78) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 3s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 36s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 45s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}191m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue | YARN-10160 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12997422/YARN-10160-005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 33dacd869cba 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d353b30 | | maven | version: Apache Maven