[jira] [Updated] (YARN-9195) RM Queue's pending container number might get decreased unexpectedly or even become negative once RM failover
[ https://issues.apache.org/jira/browse/YARN-9195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shengyang Sha updated YARN-9195: Description: Hi, all: Previously we have encountered a serious problem in ResourceManager, we found that pending container number of one RM queue became negative after RM failed over. Since queues in RM are managed in hierarchical structure, the root queue's pending containers became negative at last, thus the scheduling process of the whole cluster became affected. The version of both our RM server and YARN client in our application are based on yarn 3.1, and we uses AMRMClientAsync#addSchedulingRequests() method in our application to request resources from RM. After investigation, we found that the direct cause was numAllocations of some AMs' requests became negative after RM failed over. And there are at lease three necessary conditions: (1) Use schedulingRequests in YARN client, and the application set zero to the numAllocations for a schedulingRequest. In our batch job scenario, the numAllocations of a schedulingRequest could turn to zero because theoretically we can run a full batch job using only one container. (2) RM failovers. (3) Before AM reregisters itself to RM after RM restarts, RM has already recovered some of the application's containers assigned before. Here are some more details about the implementation: (1) After RM recovers, RM will send all alive containers to AM once it re-register itself through RegisterApplicationMasterResponse#getContainersFromPreviousAttempts. (2) During registerApplicationMaster, AMRMClientImpl will removeFromOutstandingSchedulingRequests once AM gets ContainersFromPreviousAttempts without checking whether these containers have been assigned before. As a consequence, its outstanding requests might be decreased unexpectedly even if it may not become negative. (3) There is no sanity check in RM to validate requests from AMs. For better illustrating this case, I've written a test case based on the latest hadoop trunk, posted in the attachment. You may try case testAMRMClientWithNegativePendingRequestsOnRMRestart and testAMRMClientOnUnexpectedlyDecreasedPendingRequestsOnRMRestart . To solve this issue, I propose to filter allocated containers before removeFromOutstandingSchedulingRequests in AMRMClientImpl during registerApplicationMaster, and some sanity checks are also needed to prevent things from getting worse. More comments and suggestions are welcomed. was: Hi, all: Previously we have encountered a serious problem in ResourceManager, we found that pending container number of one RM queue became negative after RM failed over. Since queues in RM are managed in hierarchical structure, the root queue's pending containers became negative at last, thus the scheduling process of the whole cluster became affected. The version of both our RM server and YARN client in our application are based on yarn 3.1, and we uses AMRMClientAsync#addSchedulingRequests() methods in our application to request resources from RM. After investigation, we found that the direct cause was numAllocations of some AMs' requests became negative after RM failed over. And there are at lease three necessary conditions: (1) Use schedulingRequests in YARN client, and the application set zero to the numAllocations for a schedulingRequest. In our batch job scenario, the numAllocations of a schedulingRequest could turn to zero because theoretically we can run a full batch job using only one container. (2) RM failovers. (3) Before AM reregisters itself to RM after RM restarts, RM has already recovered some of the application's containers assigned before. Here are some more details about the implementation: (1) After RM recovers, RM will send all alive containers to AM once it re-register itself through RegisterApplicationMasterResponse#getContainersFromPreviousAttempts. (2) During registerApplicationMaster, AMRMClientImpl will removeFromOutstandingSchedulingRequests once AM gets ContainersFromPreviousAttempts without checking whether these containers have been assigned before. As a consequence, its outstanding requests might be decreased unexpectedly even if it may not become negative. (3) There is no sanity check in RM to validate requests from AMs. For better illustrating this case, I've written a test case based on the latest hadoop trunk, posted in the attachment. You may try case testAMRMClientWithNegativePendingRequestsOnRMRestart and testAMRMClientOnUnexpectedlyDecreasedPendingRequestsOnRMRestart . To solve this issue, I propose to filter allocated containers before removeFromOutstandingSchedulingRequests in AMRMClientImpl during registerApplicationMaster, and some sanity checks are also needed to prevent things from getting worse. More comments and suggestions are welcomed. > RM Queue's pending container number might get decreased
[jira] [Created] (YARN-9195) RM Queue's pending container number might get decreased unexpectedly or even become negative once RM failover
Shengyang Sha created YARN-9195: --- Summary: RM Queue's pending container number might get decreased unexpectedly or even become negative once RM failover Key: YARN-9195 URL: https://issues.apache.org/jira/browse/YARN-9195 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.1.0 Reporter: Shengyang Sha Attachments: cases_to_recreate_negative_pending_requests_scenario.diff Hi, all: Previously we have encountered a serious problem in ResourceManager, we found that pending container number of one RM queue became negative after RM failed over. Since queues in RM are managed in hierarchical structure, the root queue's pending containers became negative at last, thus the scheduling process of the whole cluster became affected. The version of both our RM server and YARN client in our application are based on yarn 3.1, and we uses AMRMClientAsync#addSchedulingRequests() methods in our application to request resources from RM. After investigation, we found that the direct cause was numAllocations of some AMs' requests became negative after RM failed over. And there are at lease three necessary conditions: (1) Use schedulingRequests in YARN client, and the application set zero to the numAllocations for a schedulingRequest. In our batch job scenario, the numAllocations of a schedulingRequest could turn to zero because theoretically we can run a full batch job using only one container. (2) RM failovers. (3) Before AM reregisters itself to RM after RM restarts, RM has already recovered some of the application's containers assigned before. Here are some more details about the implementation: (1) After RM recovers, RM will send all alive containers to AM once it re-register itself through RegisterApplicationMasterResponse#getContainersFromPreviousAttempts. (2) During registerApplicationMaster, AMRMClientImpl will removeFromOutstandingSchedulingRequests once AM gets ContainersFromPreviousAttempts without checking whether these containers have been assigned before. As a consequence, its outstanding requests might be decreased unexpectedly even if it may not become negative. (3) There is no sanity check in RM to validate requests from AMs. For better illustrating this case, I've written a test case based on the latest hadoop trunk, posted in the attachment. You may try case testAMRMClientWithNegativePendingRequestsOnRMRestart and testAMRMClientOnUnexpectedlyDecreasedPendingRequestsOnRMRestart . To solve this issue, I propose to filter allocated containers before removeFromOutstandingSchedulingRequests in AMRMClientImpl during registerApplicationMaster, and some sanity checks are also needed to prevent things from getting worse. More comments and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7976) [atsv2 read acls] REST API to list domain/domains
[ https://issues.apache.org/jira/browse/YARN-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-7976: --- Assignee: Abhishek Modi (was: Rohith Sharma K S) > [atsv2 read acls] REST API to list domain/domains > - > > Key: YARN-7976 > URL: https://issues.apache.org/jira/browse/YARN-7976 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Abhishek Modi >Priority: Major > > Provide REST API to list domains and domain in TimelineReaderWebService. > /domains and /domain/\{domainId} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7976) [atsv2 read acls] REST API to list domain/domains
[ https://issues.apache.org/jira/browse/YARN-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741829#comment-16741829 ] Abhishek Modi commented on YARN-7976: - [~rohithsharma] I have assigned this to myself. Please let me know if you are already working on it. Thanks. > [atsv2 read acls] REST API to list domain/domains > - > > Key: YARN-7976 > URL: https://issues.apache.org/jira/browse/YARN-7976 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Abhishek Modi >Priority: Major > > Provide REST API to list domains and domain in TimelineReaderWebService. > /domains and /domain/\{domainId} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9150) Making TimelineSchemaCreator support different backends for Timeline Schema Creation in ATSv2
[ https://issues.apache.org/jira/browse/YARN-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741792#comment-16741792 ] Sushil Ks commented on YARN-9150: - Hi [~vrushalic], I resubmitted the same patch and everything looks fine from Jenkins side now. > Making TimelineSchemaCreator support different backends for Timeline Schema > Creation in ATSv2 > - > > Key: YARN-9150 > URL: https://issues.apache.org/jira/browse/YARN-9150 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Attachments: YARN-9150.001.patch, YARN-9150.002.patch > > > h3. Currently the TimelineSchemaCreator has a concrete implementation for > creating Timeline Schema's only for HBase, Hence creating this JIRA for > supporting multiple back-ends that ATSv2 can support. > *Usage:* > Add the following property in *yarn-site.xml* > {code:java} > > > yarn.timeline-service.schema-creator.class > YOUR_TIMELINE_SCHEMA_CREATOR_CLASS > > {code} > The Command needed to run the TimelineSchemaCreator need not be changed > i.e the below existing command can be used irrespective of the backend > configured. > {code:java} > bin/hadoop > org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator > -create > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9150) Making TimelineSchemaCreator support different backends for Timeline Schema Creation in ATSv2
[ https://issues.apache.org/jira/browse/YARN-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741790#comment-16741790 ] Hadoop QA commented on YARN-9150: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 3s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 15s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 50s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-tests in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}106m 26s{color} | {color:black} {color} | \\
[jira] [Commented] (YARN-5336) Limit the flow name size & consider cleanup for hex chars
[ https://issues.apache.org/jira/browse/YARN-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741786#comment-16741786 ] Vrushali C commented on YARN-5336: -- Thanks for the patch Sushil! I think I might file another jira to track limiting the size of the key/value for ALL key-value pairs being stored. The patch looks good overall, just a few minor comments: - I think we should rename FLOW_NAME_SIZE to FLOW_NAME_MAX_SIZE - We can also add in a FLOW_NAME_DEFAULT_MAX_SIZE which is set to 0. And we document that 0 indicates no restriction on the size. - As Abhishek suggested, we can also add these to the yarn site default xml. - Let's use Apache Commons Lang’s StringUtils.replace instead of the String.replaceAll at L209 in TimelineUtils.java > Limit the flow name size & consider cleanup for hex chars > - > > Key: YARN-5336 > URL: https://issues.apache.org/jira/browse/YARN-5336 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Sushil Ks >Priority: Major > Labels: YARN-5355 > Attachments: YARN-5336.001.patch > > > As recommended by [~jrottinghuis] , need to add in some limit (default and > configurable) for accepting key values to be written to the backend. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9150) Making TimelineSchemaCreator support different backends for Timeline Schema Creation in ATSv2
[ https://issues.apache.org/jira/browse/YARN-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushil Ks updated YARN-9150: Description: h3. Currently the TimelineSchemaCreator has a concrete implementation for creating Timeline Schema's only for HBase, Hence creating this JIRA for supporting multiple back-ends that ATSv2 can support. *Usage:* Add the following property in *yarn-site.xml* {code:java} yarn.timeline-service.schema-creator.class YOUR_TIMELINE_SCHEMA_CREATOR_CLASS {code} The Command needed to run the TimelineSchemaCreator need not be changed i.e the below existing command can be used irrespective of the backend configured. {code:java} bin/hadoop org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -create {code} was: h3. Currently the TimelineSchemaCreator has a concrete implementation for creating Timeline Schema's only for HBase, Hence creating this JIRA for supporting multiple back-ends that ATSv2 can support. *Usage:* Add the following property in *yarn-site.xml* {code:java} yarn.timeline-service.schema-creator.class YOUR_TIMELINE_SCHEMA_CREATOR_CLASS {code} The Command needed to run the TimelineSchemaCreator need not be changed i.e the below existing command can be used irrespective of the backend configured. {code:java} bin/hadoop org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -create {code} > Making TimelineSchemaCreator support different backends for Timeline Schema > Creation in ATSv2 > - > > Key: YARN-9150 > URL: https://issues.apache.org/jira/browse/YARN-9150 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Attachments: YARN-9150.001.patch, YARN-9150.002.patch > > > h3. Currently the TimelineSchemaCreator has a concrete implementation for > creating Timeline Schema's only for HBase, Hence creating this JIRA for > supporting multiple back-ends that ATSv2 can support. > *Usage:* > Add the following property in *yarn-site.xml* > {code:java} > > > yarn.timeline-service.schema-creator.class > YOUR_TIMELINE_SCHEMA_CREATOR_CLASS > > {code} > The Command needed to run the TimelineSchemaCreator need not be changed > i.e the below existing command can be used irrespective of the backend > configured. > {code:java} > bin/hadoop > org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator > -create > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9150) Making TimelineSchemaCreator support different backends for Timeline Schema Creation in ATSv2
[ https://issues.apache.org/jira/browse/YARN-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushil Ks updated YARN-9150: Description: h3. Currently the TimelineSchemaCreator has a concrete implementation for creating Timeline Schema's only for HBase, Hence creating this JIRA for supporting multiple back-ends that ATSv2 can support. *Usage:* Add the following property in *yarn-site.xml* {code:java} yarn.timeline-service.schema-creator.class YOUR_TIMELINE_SCHEMA_CREATOR_CLASS {code} The Command needed to run the TimelineSchemaCreator need not be changed i.e the below existing command can be used irrespective of the backend configured. {code:java} bin/hadoop org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -create {code} was: h3. Currently the TimelineSchemaCreator has a concrete implementation for creating Timeline Schema's only for HBase, Hence creating this JIRA for supporting multiple back-ends that ATSv2 can support. *Usage:* Add the following property in *yarn-site.xml* {code:java} yarn.timeline-service.schema-creator.class YOUR_TIMELINE_SCHEMA_CREATOR_CLASS {code} The Command needed to run the TimelineSchemaCreator need not be changed i.e the below existing command can be used irrespective of the backend configured. {code:java} bin/hadoop org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -create {code} > Making TimelineSchemaCreator support different backends for Timeline Schema > Creation in ATSv2 > - > > Key: YARN-9150 > URL: https://issues.apache.org/jira/browse/YARN-9150 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Attachments: YARN-9150.001.patch, YARN-9150.002.patch > > > h3. Currently the TimelineSchemaCreator has a concrete implementation for > creating Timeline Schema's only for HBase, Hence creating this JIRA for > supporting multiple back-ends that ATSv2 can support. > *Usage:* > Add the following property in *yarn-site.xml* > {code:java} > > > yarn.timeline-service.schema-creator.class > YOUR_TIMELINE_SCHEMA_CREATOR_CLASS > > {code} > The Command needed to run the TimelineSchemaCreator need not be changed > i.e the below existing command can be used irrespective of the backend > configured. > {code:java} > bin/hadoop > org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator > -create > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9150) Making TimelineSchemaCreator support different backends for Timeline Schema Creation in ATSv2
[ https://issues.apache.org/jira/browse/YARN-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushil Ks updated YARN-9150: Attachment: (was: YARN-9150.002.patch) > Making TimelineSchemaCreator support different backends for Timeline Schema > Creation in ATSv2 > - > > Key: YARN-9150 > URL: https://issues.apache.org/jira/browse/YARN-9150 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Attachments: YARN-9150.001.patch > > > h3. Currently the TimelineSchemaCreator has a concrete implementation for > creating Timeline Schema's only for HBase, Hence creating this JIRA for > supporting multiple back-ends that ATSv2 can support. > *Usage:* > Add the following property in *yarn-site.xml* > {code:java} > > > yarn.timeline-service.schema-creator.class > YOUR_TIMELINE_SCHEMA_CREATOR_CLASS > > {code} > The Command needed to run the TimelineSchemaCreator need not be changed > i.e the below existing command can be used irrespective of the backend > configured. > {code:java} > bin/hadoop > org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator > -create > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9150) Making TimelineSchemaCreator support different backends for Timeline Schema Creation in ATSv2
[ https://issues.apache.org/jira/browse/YARN-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushil Ks updated YARN-9150: Attachment: YARN-9150.002.patch > Making TimelineSchemaCreator support different backends for Timeline Schema > Creation in ATSv2 > - > > Key: YARN-9150 > URL: https://issues.apache.org/jira/browse/YARN-9150 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Attachments: YARN-9150.001.patch > > > h3. Currently the TimelineSchemaCreator has a concrete implementation for > creating Timeline Schema's only for HBase, Hence creating this JIRA for > supporting multiple back-ends that ATSv2 can support. > *Usage:* > Add the following property in *yarn-site.xml* > {code:java} > > > yarn.timeline-service.schema-creator.class > YOUR_TIMELINE_SCHEMA_CREATOR_CLASS > > {code} > The Command needed to run the TimelineSchemaCreator need not be changed > i.e the below existing command can be used irrespective of the backend > configured. > {code:java} > bin/hadoop > org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator > -create > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9190) [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1 cluster
[ https://issues.apache.org/jira/browse/YARN-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741756#comment-16741756 ] Billie Rinaldi commented on YARN-9190: -- Hi [~tangzhankun], the yarn script is only setting service.libdir for the app|application|applicationattempt|container commands (and also for the resourcemanager daemon). It isn't setting the system property for the yarn jar command, so that's why it's not working. > [Submarine] Submarine job will fail to run as a first job on a new created > Hadoop 3.2.0 RC1 cluster > --- > > Key: YARN-9190 > URL: https://issues.apache.org/jira/browse/YARN-9190 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Sunil Govindan >Priority: Minor > > This issue was found when verifying submarine in Hadoop 3.2.0 RC1 planning. > The reproduce steps are: > # Init a new HDFS and YARN (LinuxContainerExecutor and Docker enabled) > # Before run any other yarn service job, use yarn user to submit a submarine > job > The job will fail with below error: > > {code:java} > LogType:serviceam-err.txt > LogLastModifiedTime:Thu Jan 10 21:15:23 +0800 2019 > LogLength:86 > LogContents: > Error: Could not find or load main class > org.apache.hadoop.yarn.service.ServiceMaster > End of LogType:serviceam-err.txt > {code} > This seems because the dependencies are not ready as the service client > reported: > {code:java} > 2019-01-10 21:50:47,380 WARN client.ServiceClient: Property > yarn.service.framework.path has a value > /yarn-services/3.2.0/service-dep.tar.gz, but is not a valid file > 2019-01-10 21:50:47,381 INFO client.ServiceClient: Uploading all dependency > jars to HDFS. For faster submission of apps, set config property > yarn.service.framework.path to the dependency tarball location. Dependency > tarball can be uploaded to any HDFS path directly or by using command: yarn > app -enableFastLaunch []{code} > > When this error happens, I found that there is no “/yarn-services” directory > created in HDFS. > But after I run “yarn app -launch my-sleeper sleeper”, the “/yarn-services” > created in HDFS and then the submarine job can run successfully. > {code:java} > yarn@master0-VirtualBox:~/apache-hadoop-install-dir/hadoop-dev-workspace$ > hdfs dfs -ls /yarn-services/3.2.0/* > -rwxr-xr-x 1 yarn supergroup 93596476 2019-01-11 08:23 > /yarn-services/3.2.0/service-dep.tar.gz{code} > It seems an issue of yarn service in 3.2.0 RC1 and I files this Jira to track > it. > > And verified that trunk branch doesn't have this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9190) [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1 cluster
[ https://issues.apache.org/jira/browse/YARN-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741752#comment-16741752 ] Zhankun Tang commented on YARN-9190: [~billie.rinaldi] , Thanks for the reply! One thing I forget to mention is that I use the yarn script to run submarine job for both 3.2.0RC1 and trunk(3.3) build. {code:java} yarn jar $HADOOP_COMMON_HOME/share/hadoop/yarn/hadoop-yarn-submarine-${VERSION}.jar job run ...{code} It's not clear to me that given both branches' "yarn" script already set the "service.libdir", why the above same submarine run script fails in 3.2 RC1? > [Submarine] Submarine job will fail to run as a first job on a new created > Hadoop 3.2.0 RC1 cluster > --- > > Key: YARN-9190 > URL: https://issues.apache.org/jira/browse/YARN-9190 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Sunil Govindan >Priority: Minor > > This issue was found when verifying submarine in Hadoop 3.2.0 RC1 planning. > The reproduce steps are: > # Init a new HDFS and YARN (LinuxContainerExecutor and Docker enabled) > # Before run any other yarn service job, use yarn user to submit a submarine > job > The job will fail with below error: > > {code:java} > LogType:serviceam-err.txt > LogLastModifiedTime:Thu Jan 10 21:15:23 +0800 2019 > LogLength:86 > LogContents: > Error: Could not find or load main class > org.apache.hadoop.yarn.service.ServiceMaster > End of LogType:serviceam-err.txt > {code} > This seems because the dependencies are not ready as the service client > reported: > {code:java} > 2019-01-10 21:50:47,380 WARN client.ServiceClient: Property > yarn.service.framework.path has a value > /yarn-services/3.2.0/service-dep.tar.gz, but is not a valid file > 2019-01-10 21:50:47,381 INFO client.ServiceClient: Uploading all dependency > jars to HDFS. For faster submission of apps, set config property > yarn.service.framework.path to the dependency tarball location. Dependency > tarball can be uploaded to any HDFS path directly or by using command: yarn > app -enableFastLaunch []{code} > > When this error happens, I found that there is no “/yarn-services” directory > created in HDFS. > But after I run “yarn app -launch my-sleeper sleeper”, the “/yarn-services” > created in HDFS and then the submarine job can run successfully. > {code:java} > yarn@master0-VirtualBox:~/apache-hadoop-install-dir/hadoop-dev-workspace$ > hdfs dfs -ls /yarn-services/3.2.0/* > -rwxr-xr-x 1 yarn supergroup 93596476 2019-01-11 08:23 > /yarn-services/3.2.0/service-dep.tar.gz{code} > It seems an issue of yarn service in 3.2.0 RC1 and I files this Jira to track > it. > > And verified that trunk branch doesn't have this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9194) Invalid event: REGISTERED at FAILED
[ https://issues.apache.org/jira/browse/YARN-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741558#comment-16741558 ] Hadoop QA commented on YARN-9194: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 9s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}149m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9194 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12954756/YARN-9194_3.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8c3f8e2bb7aa 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3bb745d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23069/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23069/testReport/ | | Max. process+thread count | 853 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output |
[jira] [Commented] (YARN-9194) Invalid event: REGISTERED at FAILED
[ https://issues.apache.org/jira/browse/YARN-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741520#comment-16741520 ] Hadoop QA commented on YARN-9194: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 15s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}148m 48s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9194 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12954754/YARN-9194_2.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux af87ad08fad7 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3bb745d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/23068/artifact/out/whitespace-eol.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23068/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23068/testReport/ | | Max. process+thread count | 866 (vs. ulimit of 1) | | modules | C:
[jira] [Updated] (YARN-9194) Invalid event: REGISTERED at FAILED
[ https://issues.apache.org/jira/browse/YARN-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-9194: Attachment: YARN-9194_3.patch > Invalid event: REGISTERED at FAILED > --- > > Key: YARN-9194 > URL: https://issues.apache.org/jira/browse/YARN-9194 > Project: Hadoop YARN > Issue Type: Bug >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-9194_1.patch, YARN-9194_2.patch, YARN-9194_3.patch, > hadoop-hires-resourcemanager-hadoop11.log > > > While the attempt fails, the REGISTERED comes, hence the > InvalidStateTransitionException happens. > > {code:java} > 2019-01-13 00:41:57,127 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > App attempt: appattempt_1547311267249_0001_02 can't handle this event at > current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > REGISTERED at FAILED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:913) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1073) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1054) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org