[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911481#comment-13911481 ] Hudson commented on YARN-1678: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #492 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/492/]) YARN-1678. Fair scheduler gabs incessantly about reservations (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571468) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Fair scheduler gabs incessantly about reservations -- Key: YARN-1678 URL: https://issues.apache.org/jira/browse/YARN-1678 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.5.0 Attachments: YARN-1678-1.patch, YARN-1678-1.patch, YARN-1678.patch Come on FS. We really don't need to know every time a node with a reservation on it heartbeats. {code} 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Trying to fulfill reservation for application appattempt_1390547864213_0347_01 on node: host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Making reservation: node=a2330.halxg.cloudera.com app_id=application_1390547864213_0347 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1390547864213_0347 reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8, currently has 6 at priority 0; currentReservation 6144 2014-01-29 03:48:16,044 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Updated reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.
[ https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911478#comment-13911478 ] Hudson commented on YARN-1686: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #492 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/492/]) YARN-1686. Fixed NodeManager to properly handle any errors during re-registration after a RESYNC and thus avoid hanging. Contributed by Rohith Sharma. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571474) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang. Key: YARN-1686 URL: https://issues.apache.org/jira/browse/YARN-1686 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: Rohith Assignee: Rohith Fix For: 2.4.0 Attachments: YARN-1686.1.patch, YARN-1686.2.patch, YARN-1686.3.patch During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens. Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in resyncWithRM (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911482#comment-13911482 ] Hudson commented on YARN-1734: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #492 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/492/]) YARN-1734. Fixed ResourceManager to update the configurations when it transits from standby to active mode so as to assimilate any changes that happened while it was in standby mode. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571539) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Fix For: 2.4.0 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.
[ https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911578#comment-13911578 ] Hudson commented on YARN-1686: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1684 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1684/]) YARN-1686. Fixed NodeManager to properly handle any errors during re-registration after a RESYNC and thus avoid hanging. Contributed by Rohith Sharma. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571474) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang. Key: YARN-1686 URL: https://issues.apache.org/jira/browse/YARN-1686 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: Rohith Assignee: Rohith Fix For: 2.4.0 Attachments: YARN-1686.1.patch, YARN-1686.2.patch, YARN-1686.3.patch During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens. Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in resyncWithRM (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911582#comment-13911582 ] Hudson commented on YARN-1734: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1684 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1684/]) YARN-1734. Fixed ResourceManager to update the configurations when it transits from standby to active mode so as to assimilate any changes that happened while it was in standby mode. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571539) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Fix For: 2.4.0 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911581#comment-13911581 ] Hudson commented on YARN-1678: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1684 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1684/]) YARN-1678. Fair scheduler gabs incessantly about reservations (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571468) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Fair scheduler gabs incessantly about reservations -- Key: YARN-1678 URL: https://issues.apache.org/jira/browse/YARN-1678 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.5.0 Attachments: YARN-1678-1.patch, YARN-1678-1.patch, YARN-1678.patch Come on FS. We really don't need to know every time a node with a reservation on it heartbeats. {code} 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Trying to fulfill reservation for application appattempt_1390547864213_0347_01 on node: host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Making reservation: node=a2330.halxg.cloudera.com app_id=application_1390547864213_0347 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1390547864213_0347 reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8, currently has 6 at priority 0; currentReservation 6144 2014-01-29 03:48:16,044 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Updated reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1678) Fair scheduler gabs incessantly about reservations
[ https://issues.apache.org/jira/browse/YARN-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911637#comment-13911637 ] Hudson commented on YARN-1678: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1709 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1709/]) YARN-1678. Fair scheduler gabs incessantly about reservations (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571468) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Fair scheduler gabs incessantly about reservations -- Key: YARN-1678 URL: https://issues.apache.org/jira/browse/YARN-1678 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.5.0 Attachments: YARN-1678-1.patch, YARN-1678-1.patch, YARN-1678.patch Come on FS. We really don't need to know every time a node with a reservation on it heartbeats. {code} 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Trying to fulfill reservation for application appattempt_1390547864213_0347_01 on node: host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Making reservation: node=a2330.halxg.cloudera.com app_id=application_1390547864213_0347 2014-01-29 03:48:16,043 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1390547864213_0347 reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8, currently has 6 at priority 0; currentReservation 6144 2014-01-29 03:48:16,044 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Updated reserved container container_1390547864213_0347_01_03 on node host: a2330.halxg.cloudera.com:8041 #containers=8 available=memory:0, vCores:8 used=memory:8192, vCores:8 for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@1cb01d20 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1686) NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.
[ https://issues.apache.org/jira/browse/YARN-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911634#comment-13911634 ] Hudson commented on YARN-1686: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1709 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1709/]) YARN-1686. Fixed NodeManager to properly handle any errors during re-registration after a RESYNC and thus avoid hanging. Contributed by Rohith Sharma. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571474) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang. Key: YARN-1686 URL: https://issues.apache.org/jira/browse/YARN-1686 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: Rohith Assignee: Rohith Fix For: 2.4.0 Attachments: YARN-1686.1.patch, YARN-1686.2.patch, YARN-1686.3.patch During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens. Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in resyncWithRM (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1734) RM should get the updated Configurations when it transits from Standby to Active
[ https://issues.apache.org/jira/browse/YARN-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911638#comment-13911638 ] Hudson commented on YARN-1734: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1709 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1709/]) YARN-1734. Fixed ResourceManager to update the configurations when it transits from standby to active mode so as to assimilate any changes that happened while it was in standby mode. Contributed by Xuan Gong. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571539) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java RM should get the updated Configurations when it transits from Standby to Active Key: YARN-1734 URL: https://issues.apache.org/jira/browse/YARN-1734 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Critical Fix For: 2.4.0 Attachments: YARN-1734.1.patch, YARN-1734.2.patch, YARN-1734.3.patch, YARN-1734.4.patch, YARN-1734.5.patch, YARN-1734.6.patch, YARN-1734.7.patch Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1730: - Attachment: YARN-1730.3.patch Rebased patch 1 against trunk. Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch The actual data writes are performed atomically in a batch, but a lock should be held while identifying a start time for the entity, which precedes every write. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1757) Auxiliary service support for nodemanager recovery
[ https://issues.apache.org/jira/browse/YARN-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1757: - Attachment: YARN-1757.patch Patch to have the nodemanager create an aux-service-specific path under the specified NM recovery directory where an aux service can store recoverable state. The presence or absence of this path indicates whether NM recovery is enabled (or aux service could check conf directly). Auxiliary service support for nodemanager recovery -- Key: YARN-1757 URL: https://issues.apache.org/jira/browse/YARN-1757 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1757.patch There needs to be a mechanism for communicating to auxiliary services whether nodemanager recovery is enabled and where they should store their state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1757) Auxiliary service support for nodemanager recovery
[ https://issues.apache.org/jira/browse/YARN-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1757: - Target Version/s: 2.5.0 (was: 2.4.0) Auxiliary service support for nodemanager recovery -- Key: YARN-1757 URL: https://issues.apache.org/jira/browse/YARN-1757 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1757.patch There needs to be a mechanism for communicating to auxiliary services whether nodemanager recovery is enabled and where they should store their state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes the use of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911797#comment-13911797 ] Sandy Ryza commented on YARN-1760: -- +1 TestRMAdminService assumes the use of CapacityScheduler --- Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch, yarn-1760-3.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1760) TestRMAdminService assumes CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1760: --- Summary: TestRMAdminService assumes CapacityScheduler (was: TestRMAdminService assumes the use of CapacityScheduler) TestRMAdminService assumes CapacityScheduler Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch, yarn-1760-3.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911817#comment-13911817 ] Xuan Gong commented on YARN-1410: - Sounds good to me. For 1) RM fails over after getApplicationID() and *before* submitApplication(). The changes we will make is to let RM accept the “old” applicationId which includes: * make RM accept the applicationId in the context * If there is no applicationId specified in the context, RM will assign a new ApplicationId For 2) RM fail overs *during* the submitApplication call. We have many discussions for this scenario. We can open a separate ticket for it. For 3) RM fails over *after* the submitApplication call and before the subsequent getApplicationReport(). We can mark getApplicationReport() as Idempotent, and need to handle two different cases: * Failover happens after SubmitApplicationResponse is received, but RMStateStore does not save the applicationState. In this case, when the getApplicationReport() is called, we will get an ApplicationNotFoundException. So, we need to catch this exception and submit this application again * Failover happens after SubmitApplicationResponse is received, and RMStateStore saves the applicationState. Nothing need to be changed. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911824#comment-13911824 ] Hadoop QA commented on YARN-1730: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630988/YARN-1730.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3176//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3176//console This message is automatically generated. Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch The actual data writes are performed atomically in a batch, but a lock should be held while identifying a start time for the entity, which precedes every write. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666
[ https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911827#comment-13911827 ] Tsuyoshi OZAWA commented on YARN-1758: -- Thank you for reporting, [~hitesh]. I cannot reproduce NPE with a command mvn test under hadoop-yarn-project on local. Could you tell me the case this problem occurs? MiniYARNCluster broken post YARN-1666 - Key: YARN-1758 URL: https://issues.apache.org/jira/browse/YARN-1758 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah NPE seen when trying to use MiniYARNCluster -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1760) TestRMAdminService assumes CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911826#comment-13911826 ] Hudson commented on YARN-1760: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5222 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5222/]) YARN-1760. TestRMAdminService assumes CapacityScheduler. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571777) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java TestRMAdminService assumes CapacityScheduler Key: YARN-1760 URL: https://issues.apache.org/jira/browse/YARN-1760 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Labels: test Attachments: yarn-1760-1.patch, yarn-1760-2.patch, yarn-1760-3.patch YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911842#comment-13911842 ] Bikas Saha commented on YARN-1410: -- bq. getApplicationReport() is called, we will get an ApplicationNotFoundException. So, we need to catch this exception and submit this application again It would be good if via HAUtil we may be able to get an indication whether a failover has occurred or not. If it has occurred then its ok to get this exception but if it has not then its a bug. We can defer that to a separate jira if its too much work. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911845#comment-13911845 ] Karthik Kambatla commented on YARN-1410: As Vinod suggested, can we limit this JIRA to 1 and open separate JIRAs for 2 and 3. I don't see 3 to be as straight-forward, and suspect would require revisiting the state machine. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666
[ https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911863#comment-13911863 ] Hitesh Shah commented on YARN-1758: --- [~ozawa] The problem is due to the loading of yarn-site.xml and other resources in ResourceManager::serviceInit(). It does not show up in yarn tests as the yarn-site.xml is added into yarn-resourcemanager-tests.jar I believe. However, for all downstream projects, they depend on hadoop-yarn-server-tests-tests.jar for MiniYarnCluster which itself does not have the necessary yarn-site, etc. I think the fix might be as simple as moving the required configs from hadoop-yarn-server-resourcemanager/src/test/resources/ to hadoop-yarn-server-tests/src/test/resources/ so that the required confs are bundled in the same tests jar. MiniYARNCluster broken post YARN-1666 - Key: YARN-1758 URL: https://issues.apache.org/jira/browse/YARN-1758 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah NPE seen when trying to use MiniYARNCluster -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1757) Auxiliary service support for nodemanager recovery
[ https://issues.apache.org/jira/browse/YARN-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911867#comment-13911867 ] Hadoop QA commented on YARN-1757: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630989/YARN-1757.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServer org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3175//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3175//console This message is automatically generated. Auxiliary service support for nodemanager recovery -- Key: YARN-1757 URL: https://issues.apache.org/jira/browse/YARN-1757 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1757.patch There needs to be a mechanism for communicating to auxiliary services whether nodemanager recovery is enabled and where they should store their state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911876#comment-13911876 ] Naren Koneru commented on YARN-1577: Hi Jian, are you working on this issue? If not, I would like to take a look. Can you please comment. Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Jian He Priority: Blocker Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1764) Handle RM fail overs after the submitApplication call.
Xuan Gong created YARN-1764: --- Summary: Handle RM fail overs after the submitApplication call. Key: YARN-1764 URL: https://issues.apache.org/jira/browse/YARN-1764 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1758) MiniYARNCluster broken post YARN-1666
[ https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1758: --- Assignee: Xuan Gong MiniYARNCluster broken post YARN-1666 - Key: YARN-1758 URL: https://issues.apache.org/jira/browse/YARN-1758 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Xuan Gong NPE seen when trying to use MiniYARNCluster -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1713) Implement getnewapplication as part of RM web service
[ https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1713: Attachment: (was: yarn-1713.patch) Implement getnewapplication as part of RM web service - Key: YARN-1713 URL: https://issues.apache.org/jira/browse/YARN-1713 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1713) Implement getnewapplication as part of RM web service
[ https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1713: Attachment: apache-yarn-1713.patch Implement getnewapplication as part of RM web service - Key: YARN-1713 URL: https://issues.apache.org/jira/browse/YARN-1713 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1758) MiniYARNCluster broken post YARN-1666
[ https://issues.apache.org/jira/browse/YARN-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911902#comment-13911902 ] Xuan Gong commented on YARN-1758: - Do not move the configs out of hadoop-yarn-server-resourcemanager/src/test/resources/. TestRMAdminService does not use miniYarnCluster and need these configs MiniYARNCluster broken post YARN-1666 - Key: YARN-1758 URL: https://issues.apache.org/jira/browse/YARN-1758 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Xuan Gong NPE seen when trying to use MiniYARNCluster -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911903#comment-13911903 ] Jian He commented on YARN-1577: --- Hi [~naren.koneru], sure, you can work on it. Tx Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Jian He Priority: Blocker Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service
[ https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911910#comment-13911910 ] Varun Vasudev commented on YARN-1713: - Attached two patch files. Testing for the functionality requires running parametrized tests which is also part of the patch for the kill app functionality(issue 1702). Without the parametrized testing the submit app testing would be incomplete. apache-yarn-1713.patch contains changes just for the submit app functionality and apache-yarn-1713.cumulative.patch contains changes for the kill app and submit app functionality so that it can be applied and tested. Implement getnewapplication and submitapp as part of RM web service --- Key: YARN-1713 URL: https://issues.apache.org/jira/browse/YARN-1713 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service
[ https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1713: Summary: Implement getnewapplication and submitapp as part of RM web service (was: Implement getnewapplication as part of RM web service) Implement getnewapplication and submitapp as part of RM web service --- Key: YARN-1713 URL: https://issues.apache.org/jira/browse/YARN-1713 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naren Koneru reassigned YARN-1577: -- Assignee: Naren Koneru (was: Jian He) Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Jian He Assignee: Naren Koneru Priority: Blocker Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911947#comment-13911947 ] Zhijie Shen commented on YARN-1730: --- bq. The hold count only returns the number of holds that have been obtained by the current thread. So as soon as the current thread is done with the lock, it would drop the lock from the lock map, which is not what we want. Make sense. One minor comment: how about make CountReentrantLock extend ReentrantLock, in which way the code should a bit cleaner? Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch The actual data writes are performed atomically in a batch, but a lock should be held while identifying a start time for the entity, which precedes every write. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service
[ https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911981#comment-13911981 ] Hadoop QA commented on YARN-1713: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631012/apache-yarn-1713.cumulative.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3177//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3177//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3177//console This message is automatically generated. Implement getnewapplication and submitapp as part of RM web service --- Key: YARN-1713 URL: https://issues.apache.org/jira/browse/YARN-1713 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1730: - Attachment: YARN-1730.4.patch Sounds fine to me. Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch, YARN-1730.4.patch The actual data writes are performed atomically in a batch, but a lock should be held while identifying a start time for the entity, which precedes every write. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1757) Auxiliary service support for nodemanager recovery
[ https://issues.apache.org/jira/browse/YARN-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1757: - Attachment: YARN-1757.patch Test failures are of the Bind address already in use variety, and the NM tests run clean for me locally. Uploading the same patch again to see if it was a sporadic failure. Auxiliary service support for nodemanager recovery -- Key: YARN-1757 URL: https://issues.apache.org/jira/browse/YARN-1757 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1757.patch, YARN-1757.patch There needs to be a mechanism for communicating to auxiliary services whether nodemanager recovery is enabled and where they should store their state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912071#comment-13912071 ] Xuan Gong commented on YARN-1410: - Create https://issues.apache.org/jira/browse/YARN-1763 to track 2) RM fail overs *during* the submitApplication call. Create https://issues.apache.org/jira/browse/YARN-1764 to track 3) RM fails over *after* the submitApplication call and before the subsequent getApplicationReport(). This ticket is used to track 1) RM fails over after getApplicationID() and *before* submitApplication(). Create a patch which includes : * make RM accept the applicationId in the context. Nothing need to be changed here. * If there is no applicationId specified in the context, RM will assign a new ApplicationId. Also added two testcases to test AppSubmissionWithApplicationId and AppSubmissionWithoutApplicationId Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1410: Attachment: YARN-1410.6.patch Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1730) Leveldb timeline store needs simple write locking
[ https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912090#comment-13912090 ] Hadoop QA commented on YARN-1730: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631034/YARN-1730.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3179//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3179//console This message is automatically generated. Leveldb timeline store needs simple write locking - Key: YARN-1730 URL: https://issues.apache.org/jira/browse/YARN-1730 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch, YARN-1730.4.patch The actual data writes are performed atomically in a batch, but a lock should be held while identifying a start time for the entity, which precedes every write. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1757) Auxiliary service support for nodemanager recovery
[ https://issues.apache.org/jira/browse/YARN-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912107#comment-13912107 ] Hadoop QA commented on YARN-1757: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631035/YARN-1757.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3178//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3178//console This message is automatically generated. Auxiliary service support for nodemanager recovery -- Key: YARN-1757 URL: https://issues.apache.org/jira/browse/YARN-1757 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1757.patch, YARN-1757.patch There needs to be a mechanism for communicating to auxiliary services whether nodemanager recovery is enabled and where they should store their state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912144#comment-13912144 ] Karthik Kambatla commented on YARN-1492: Thanks for sharing this, [~ctrezzo]. The document is nicely written. Few comments: * Would SCM be a single point of failure? If yes, would anyone of the following approaches make sense. ** Make SCM an AM. From YARN-896, the only sub-task that affects this would be the delegation tokens. ** Add an SCMMonitorService to the RM. If SCM is enabled, this service would start the SCM on one of the nodes and monitor it. * SCM Cleaner Service - the doc mentions the tension between frequency of cleaner and load on the RM. Can you elaborate? I was of the opinion that the RM is not involved in the caching at all. * Cleaner protocol doesn't mention when the cleaner lock is cleared. I assume it is cleared on each exit path. * Nit: ZK-based store - we can may be do this in the JIRA corresponding to the sub-task - how would this look like? * More nit-picking: The rationale for not using in-memory and reconstructing seems to come from long-running applications. Given long-running applications don't benefit from the shared cache as much as the shorter ones, is this a huge concern? truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912167#comment-13912167 ] Hadoop QA commented on YARN-1410: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631041/YARN-1410.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestRMFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3180//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3180//console This message is automatically generated. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1740) Redirection from AM-URL is broken with HTTPS_ONLY policy
[ https://issues.apache.org/jira/browse/YARN-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1740: -- Attachment: YARN-1740.2.patch Redirection from AM-URL is broken with HTTPS_ONLY policy Key: YARN-1740 URL: https://issues.apache.org/jira/browse/YARN-1740 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Assignee: Jian He Attachments: YARN-1740.1.patch, YARN-1740.2.patch Steps to reproduce: 1) Run a sleep job 2) Run: yarn application -list command to find AM URL. root@host1:~# yarn application -list Total number of applications (application-types: [] and states: SUBMITTED, ACCEPTED, RUNNING):1 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1383251398986_0003 Sleep job MAPREDUCE hdfs default RUNNING UNDEFINED 5% http://host1:40653 3) Try to access http://host1:40653/ws/v1/mapreduce/info; url. This URL redirects to http://RM_host:RM_https_port/proxy/application_1383251398986_0003/ws/v1/mapreduce/info Here, Http protocol is used with HTTPS port for RM. The expected Url is https://RM_host:RM_https_port/proxy/application_1383251398986_0003/ws/v1/mapreduce/info -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1740) Redirection from AM-URL is broken with HTTPS_ONLY policy
[ https://issues.apache.org/jira/browse/YARN-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912176#comment-13912176 ] Jian He commented on YARN-1740: --- Added a test case for testing MR web app is explicitly disabling SSL. The AM-URL redirected by amIPFilter to web proxy is not easily tested since there is one more issue that, amIpFilter is not able to differentiate the request coming from web proxy or from the localhost if it's a local machine cluster setup. Redirection from AM-URL is broken with HTTPS_ONLY policy Key: YARN-1740 URL: https://issues.apache.org/jira/browse/YARN-1740 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Assignee: Jian He Attachments: YARN-1740.1.patch, YARN-1740.2.patch Steps to reproduce: 1) Run a sleep job 2) Run: yarn application -list command to find AM URL. root@host1:~# yarn application -list Total number of applications (application-types: [] and states: SUBMITTED, ACCEPTED, RUNNING):1 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1383251398986_0003 Sleep job MAPREDUCE hdfs default RUNNING UNDEFINED 5% http://host1:40653 3) Try to access http://host1:40653/ws/v1/mapreduce/info; url. This URL redirects to http://RM_host:RM_https_port/proxy/application_1383251398986_0003/ws/v1/mapreduce/info Here, Http protocol is used with HTTPS port for RM. The expected Url is https://RM_host:RM_https_port/proxy/application_1383251398986_0003/ws/v1/mapreduce/info -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1410: Attachment: YARN-1410.7.patch Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch, YARN-1410.7.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912188#comment-13912188 ] Xuan Gong commented on YARN-1410: - Test case is passing locally. Added verifyConnections() and try again... Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch, YARN-1410.7.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1729) ATSWebServices always passes primary and secondary filters as strings
[ https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-1729: - Attachment: YARN-1729.3.patch Updated patch for trunk. ATSWebServices always passes primary and secondary filters as strings - Key: YARN-1729 URL: https://issues.apache.org/jira/browse/YARN-1729 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch Primary filters and secondary filter values can be arbitrary json-compatible Object. The web services should determine if the filters specified as query parameters are objects or strings before passing them to the store. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1658) Webservice should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cindy Li updated YARN-1658: --- Attachment: YARN1658.patch Initial patch, based on YARN1525 patch. Webservice should redirect to active RM when HA is enabled. --- Key: YARN-1658 URL: https://issues.apache.org/jira/browse/YARN-1658 Project: Hadoop YARN Issue Type: Sub-task Reporter: Cindy Li Assignee: Cindy Li Labels: YARN Attachments: YARN1658.patch When HA is enabled, web service to standby RM should be redirected to the active RM. This is a related Jira to YARN-1525. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1740) Redirection from AM-URL is broken with HTTPS_ONLY policy
[ https://issues.apache.org/jira/browse/YARN-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912364#comment-13912364 ] Hadoop QA commented on YARN-1740: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631076/YARN-1740.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3181//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3181//console This message is automatically generated. Redirection from AM-URL is broken with HTTPS_ONLY policy Key: YARN-1740 URL: https://issues.apache.org/jira/browse/YARN-1740 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Assignee: Jian He Attachments: YARN-1740.1.patch, YARN-1740.2.patch Steps to reproduce: 1) Run a sleep job 2) Run: yarn application -list command to find AM URL. root@host1:~# yarn application -list Total number of applications (application-types: [] and states: SUBMITTED, ACCEPTED, RUNNING):1 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1383251398986_0003 Sleep job MAPREDUCE hdfs default RUNNING UNDEFINED 5% http://host1:40653 3) Try to access http://host1:40653/ws/v1/mapreduce/info; url. This URL redirects to http://RM_host:RM_https_port/proxy/application_1383251398986_0003/ws/v1/mapreduce/info Here, Http protocol is used with HTTPS port for RM. The expected Url is https://RM_host:RM_https_port/proxy/application_1383251398986_0003/ws/v1/mapreduce/info -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912382#comment-13912382 ] Hadoop QA commented on YARN-1410: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631077/YARN-1410.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3182//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3182//console This message is automatically generated. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch, YARN-1410.7.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1700) AHS records non-launched containers
[ https://issues.apache.org/jira/browse/YARN-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912383#comment-13912383 ] Zhijie Shen commented on YARN-1700: --- Log url is nullable. In this scenario, the container is not launched. It is also possible that the the container is completed, but the finish information is not written into the history store. The correct fix should be correcting AppAttemptBlock and ContainerBlock to handle the case that log url is null. This is what YARN-1685 is supposed to do. On the other side, even an container is not launched, we still want to record it, though the current information we have collected can not tell whether the container is finished after running some executable, or it is even not started. However, we're going to improve the exposed information to let users see this difference. Moreover, we are seeking for providing integrated access of the information for both running and finished containers, via both RPC and web interfaces. Given this done, users will monitor the containers before being launched, being running, being completed and etc. [~jira.shegalov], if you're fine with it. We can close the ticket as duplicate of YARN-1685. Thanks! AHS records non-launched containers --- Key: YARN-1700 URL: https://issues.apache.org/jira/browse/YARN-1700 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1700.v01.patch, YARN-1700.v02.patch When testing AHS with a MR sleep job, AHS sometimes threw NPE out of AppAttemptBlock.render because logUrl in container report was null. I realized that this is because AHS may record containers that never launch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1729) ATSWebServices always passes primary and secondary filters as strings
[ https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912388#comment-13912388 ] Hadoop QA commented on YARN-1729: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631092/YARN-1729.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3183//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3183//console This message is automatically generated. ATSWebServices always passes primary and secondary filters as strings - Key: YARN-1729 URL: https://issues.apache.org/jira/browse/YARN-1729 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch Primary filters and secondary filter values can be arbitrary json-compatible Object. The web services should determine if the filters specified as query parameters are objects or strings before passing them to the store. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service
[ https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912394#comment-13912394 ] Jian He commented on YARN-1713: --- Some comments on patch apache-yarn-1713.cumulative.patch: - styling issue, please follow the convention of 80 column limit - appIdToRMApp is not actually adding Id to RMApp, more likely getRMAppFromRMContext() ? - we have created factory method in each user-facing record for instantiating the record, e.g.: ApplicationSubmissionContext.newInstance, you can use that. {code} createAppSubmissionContext(AppSubmissionInfo newApp) t {code} - createNewApplication should be a get request as it only returns the applicationId etc. just like the one in ClientRMService.getNewApplicaiton - you can attach a name for the XmlRootElement like @XmlRootElement(name = appAttempt) to specify the element name, so we can do @XmlRootElement(name = “newApplication”) - Note that RMWebServices.hasAcess() is checked against VIEW_APP permission, in the case of submit/kill, we should check against MODIFY_APP Implement getnewapplication and submitapp as part of RM web service --- Key: YARN-1713 URL: https://issues.apache.org/jira/browse/YARN-1713 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1588) Rebind NM tokens for previous attempt's running containers to the new attempt
[ https://issues.apache.org/jira/browse/YARN-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1588: -- Attachment: YARN-1588.5.patch Refactored some logging in the new patch Rebind NM tokens for previous attempt's running containers to the new attempt - Key: YARN-1588 URL: https://issues.apache.org/jira/browse/YARN-1588 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1588.1.patch, YARN-1588.1.patch, YARN-1588.2.patch, YARN-1588.3.patch, YARN-1588.4.patch, YARN-1588.5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1429) YARN_CLASSPATH is referenced in yarn command comments but doesn't do anything
[ https://issues.apache.org/jira/browse/YARN-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated YARN-1429: - Attachment: YARN-1429.patch Attaching updated version that changed the name to {{YARN_USER_CLASSPATH}} and also introduced second variable {{YARN_USER_CLASSPATH_FIRST}} that will enable user to put the content at the beginning of the final classpath. I do feel that those names are quite descriptive, but please do not hesitate and let me know if you have better names in mind! YARN_CLASSPATH is referenced in yarn command comments but doesn't do anything - Key: YARN-1429 URL: https://issues.apache.org/jira/browse/YARN-1429 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Sandy Ryza Assignee: Jarek Jarcec Cecho Priority: Trivial Labels: newbie Attachments: YARN-1429.patch, YARN-1429.patch YARN_CLASSPATH is referenced in the comments in ./hadoop-yarn-project/hadoop-yarn/bin/yarn and ./hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd, but doesn't do anything. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1588) Rebind NM tokens for previous attempt's running containers to the new attempt
[ https://issues.apache.org/jira/browse/YARN-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912453#comment-13912453 ] Hadoop QA commented on YARN-1588: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631127/YARN-1588.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3184//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3184//console This message is automatically generated. Rebind NM tokens for previous attempt's running containers to the new attempt - Key: YARN-1588 URL: https://issues.apache.org/jira/browse/YARN-1588 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1588.1.patch, YARN-1588.1.patch, YARN-1588.2.patch, YARN-1588.3.patch, YARN-1588.4.patch, YARN-1588.5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912455#comment-13912455 ] Hadoop QA commented on YARN-1506: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629031/YARN-1506-v7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3185//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3185//console This message is automatically generated. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1561) Fix a generic type warning in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912476#comment-13912476 ] Hadoop QA commented on YARN-1561: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12624155/yarn-1561.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3186//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3186//console This message is automatically generated. Fix a generic type warning in FairScheduler --- Key: YARN-1561 URL: https://issues.apache.org/jira/browse/YARN-1561 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Junping Du Assignee: Chen He Priority: Minor Labels: newbie Fix For: 2.4.0 Attachments: yarn-1561.patch The Comparator below should be specified with type: private Comparator nodeAvailableResourceComparator = new NodeAvailableResourceComparator(); -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1429) YARN_CLASSPATH is referenced in yarn command comments but doesn't do anything
[ https://issues.apache.org/jira/browse/YARN-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated YARN-1429: - Attachment: YARN-1429.linux.patch I've noticed that my generated patch can't be easily applied with {{patch}} utility. I'm having troubles with CRLF encoding of the original file versus LF encoding of generated patch. The {{git apply}} seems to be smart enough to over that but unix {{patch}} is failing on that. Is there anything like svn apply? YARN_CLASSPATH is referenced in yarn command comments but doesn't do anything - Key: YARN-1429 URL: https://issues.apache.org/jira/browse/YARN-1429 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Sandy Ryza Assignee: Jarek Jarcec Cecho Priority: Trivial Labels: newbie Attachments: YARN-1429.linux.patch, YARN-1429.patch, YARN-1429.patch YARN_CLASSPATH is referenced in the comments in ./hadoop-yarn-project/hadoop-yarn/bin/yarn and ./hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd, but doesn't do anything. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1363) Get / Cancel / Renew delegation token api should be non blocking
[ https://issues.apache.org/jira/browse/YARN-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1363: -- Hadoop Flags: Incompatible change Get / Cancel / Renew delegation token api should be non blocking Key: YARN-1363 URL: https://issues.apache.org/jira/browse/YARN-1363 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Zhijie Shen Attachments: YARN-1363.1.patch, YARN-1363.2.patch, YARN-1363.3.patch, YARN-1363.4.patch, YARN-1363.5.patch, YARN-1363.6.patch, YARN-1363.7.patch Today GetDelgationToken, CancelDelegationToken and RenewDelegationToken are all blocking apis. * As a part of these calls we try to update RMStateStore and that may slow it down. * Now as we have limited number of client request handlers we may fill up client handlers quickly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1752) Unexpected Unregistered event at Attempt Launched state
[ https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-1752: Assignee: Rohith Unexpected Unregistered event at Attempt Launched state --- Key: YARN-1752 URL: https://issues.apache.org/jira/browse/YARN-1752 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Rohith {code} 2014-02-21 14:56:03,453 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: UNREGISTERED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:695) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1752) Unexpected Unregistered event at Attempt Launched state
[ https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912501#comment-13912501 ] Rohith commented on YARN-1752: -- I reproduced this case using debug point. This need to be fixed from MapReduce , better to handle from MapReduce project. Unexpected Unregistered event at Attempt Launched state --- Key: YARN-1752 URL: https://issues.apache.org/jira/browse/YARN-1752 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Rohith {code} 2014-02-21 14:56:03,453 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: UNREGISTERED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:695) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1588) Rebind NM tokens for previous attempt's running containers to the new attempt
[ https://issues.apache.org/jira/browse/YARN-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912516#comment-13912516 ] Jian He commented on YARN-1588: --- Can't reproduce test failure locally, it can be similar to YARN-1657 Rebind NM tokens for previous attempt's running containers to the new attempt - Key: YARN-1588 URL: https://issues.apache.org/jira/browse/YARN-1588 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-1588.1.patch, YARN-1588.1.patch, YARN-1588.2.patch, YARN-1588.3.patch, YARN-1588.4.patch, YARN-1588.5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1700) AHS records non-launched containers
[ https://issues.apache.org/jira/browse/YARN-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912515#comment-13912515 ] Gera Shegalov commented on YARN-1700: - Log URL is physically nullable, but in the current source code as is it never changes after the launch and is not null. If the intention is to record/display even containers that have not launched, I am fine treating it as a dup of YARN-1685 AHS records non-launched containers --- Key: YARN-1700 URL: https://issues.apache.org/jira/browse/YARN-1700 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1700.v01.patch, YARN-1700.v02.patch When testing AHS with a MR sleep job, AHS sometimes threw NPE out of AppAttemptBlock.render because logUrl in container report was null. I realized that this is because AHS may record containers that never launch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1429) YARN_CLASSPATH is referenced in yarn command comments but doesn't do anything
[ https://issues.apache.org/jira/browse/YARN-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912519#comment-13912519 ] Hadoop QA commented on YARN-1429: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631149/YARN-1429.linux.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3187//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3187//console This message is automatically generated. YARN_CLASSPATH is referenced in yarn command comments but doesn't do anything - Key: YARN-1429 URL: https://issues.apache.org/jira/browse/YARN-1429 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Sandy Ryza Assignee: Jarek Jarcec Cecho Priority: Trivial Labels: newbie Attachments: YARN-1429.linux.patch, YARN-1429.patch, YARN-1429.patch YARN_CLASSPATH is referenced in the comments in ./hadoop-yarn-project/hadoop-yarn/bin/yarn and ./hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd, but doesn't do anything. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (YARN-1700) AHS records non-launched containers
[ https://issues.apache.org/jira/browse/YARN-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1700. --- Resolution: Duplicate AHS records non-launched containers --- Key: YARN-1700 URL: https://issues.apache.org/jira/browse/YARN-1700 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1700.v01.patch, YARN-1700.v02.patch When testing AHS with a MR sleep job, AHS sometimes threw NPE out of AppAttemptBlock.render because logUrl in container report was null. I realized that this is because AHS may record containers that never launch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1700) AHS records non-launched containers
[ https://issues.apache.org/jira/browse/YARN-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912551#comment-13912551 ] Zhijie Shen commented on YARN-1700: --- bq. Log URL is physically nullable, but in the current source code as is it never changes after the launch and is not null. Good catch. This is another issue of the current code, which should be fixed in YARN-1685 as well. See my prior comment in YARN-1413: https://issues.apache.org/jira/browse/YARN-1413?focusedCommentId=13866844page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13866844 When the container is running, the log url should point to the NM web page, which serves the running container log; when it is finished, the log url should then be updated (See TODO in RMContainerImpl.java), and point to the AHS web page, which serves the aggregated log. AHS records non-launched containers --- Key: YARN-1700 URL: https://issues.apache.org/jira/browse/YARN-1700 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1700.v01.patch, YARN-1700.v02.patch When testing AHS with a MR sleep job, AHS sometimes threw NPE out of AppAttemptBlock.render because logUrl in container report was null. I realized that this is because AHS may record containers that never launch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1685) Log URL should be different when the container is running and finished, and null case needs to be handled
[ https://issues.apache.org/jira/browse/YARN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1685: -- Summary: Log URL should be different when the container is running and finished, and null case needs to be handled (was: [YARN-321] Logs link can be null so avoid NPE) Log URL should be different when the container is running and finished, and null case needs to be handled - Key: YARN-1685 URL: https://issues.apache.org/jira/browse/YARN-1685 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Fix For: YARN-321 Attachments: YARN-1685-1.patch https://issues.apache.org/jira/browse/YARN-1413?focusedCommentId=13866416page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13866416 https://issues.apache.org/jira/browse/YARN-1413?focusedCommentId=13866844page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13866844 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1765) Write test cases to verify that killApplication API works in RM HA
Xuan Gong created YARN-1765: --- Summary: Write test cases to verify that killApplication API works in RM HA Key: YARN-1765 URL: https://issues.apache.org/jira/browse/YARN-1765 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1765) Write test cases to verify that killApplication API works in RM HA
[ https://issues.apache.org/jira/browse/YARN-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1765: Attachment: YARN-1765.1.patch Write test cases to verify that killApplication API works in RM HA -- Key: YARN-1765 URL: https://issues.apache.org/jira/browse/YARN-1765 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1765.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1561) Fix a generic type warning in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912565#comment-13912565 ] Hudson commented on YARN-1561: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5226 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5226/]) YARN-1561. Fix a generic type warning in FairScheduler. (Chen He via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1571924) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Fix a generic type warning in FairScheduler --- Key: YARN-1561 URL: https://issues.apache.org/jira/browse/YARN-1561 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Junping Du Assignee: Chen He Priority: Minor Labels: newbie Fix For: 2.5.0 Attachments: yarn-1561.patch The Comparator below should be specified with type: private Comparator nodeAvailableResourceComparator = new NodeAvailableResourceComparator(); -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1765) Write test cases to verify that killApplication API works in RM HA
[ https://issues.apache.org/jira/browse/YARN-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912590#comment-13912590 ] Hadoop QA commented on YARN-1765: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12631158/YARN-1765.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3188//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3188//console This message is automatically generated. Write test cases to verify that killApplication API works in RM HA -- Key: YARN-1765 URL: https://issues.apache.org/jira/browse/YARN-1765 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1765.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)