[jira] [Commented] (YARN-2394) FairScheduler: Configure fairSharePreemptionThreshold per queue
[ https://issues.apache.org/jira/browse/YARN-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121255#comment-14121255 ] Hudson commented on YARN-2394: -- FAILURE: Integrated in Hadoop-Yarn-trunk #670 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/670/]) YARN-2394. FairScheduler: Configure fairSharePreemptionThreshold per queue. (Wei Yan via kasha) (kasha: rev 1dcaba9a7aa27f7ca4ba693e3abb56ab3c59c8a7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java * hadoop-yarn-project/CHANGES.txt FairScheduler: Configure fairSharePreemptionThreshold per queue --- Key: YARN-2394 URL: https://issues.apache.org/jira/browse/YARN-2394 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Fix For: 2.6.0 Attachments: YARN-2394-1.patch, YARN-2394-2.patch, YARN-2394-3.patch, YARN-2394-4.patch, YARN-2394-5.patch, YARN-2394-6.patch, YARN-2394-7.patch Preemption based on fair share starvation happens when usage of a queue is less than 50% of its fair share. This 50% is hardcoded. We'd like to make this configurable on a per queue basis, so that we can choose the threshold at which we want to preempt. Calling this config fairSharePreemptionThreshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2394) FairScheduler: Configure fairSharePreemptionThreshold per queue
[ https://issues.apache.org/jira/browse/YARN-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121357#comment-14121357 ] Hudson commented on YARN-2394: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1861 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1861/]) YARN-2394. FairScheduler: Configure fairSharePreemptionThreshold per queue. (Wei Yan via kasha) (kasha: rev 1dcaba9a7aa27f7ca4ba693e3abb56ab3c59c8a7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java FairScheduler: Configure fairSharePreemptionThreshold per queue --- Key: YARN-2394 URL: https://issues.apache.org/jira/browse/YARN-2394 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Fix For: 2.6.0 Attachments: YARN-2394-1.patch, YARN-2394-2.patch, YARN-2394-3.patch, YARN-2394-4.patch, YARN-2394-5.patch, YARN-2394-6.patch, YARN-2394-7.patch Preemption based on fair share starvation happens when usage of a queue is less than 50% of its fair share. This 50% is hardcoded. We'd like to make this configurable on a per queue basis, so that we can choose the threshold at which we want to preempt. Calling this config fairSharePreemptionThreshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-415: Attachment: YARN-415.201409040036.txt [~kkambatl], thank you for taking the time to review this patch. bq. The patch doesn't apply anymore, Upmerged patch to latest branch-2 and trunk. {quote} 1. ResourceManagerRest.apt.vm documents the memory and vcores as utilized. We should update this to allocated. {quote} Changed the text. {quote} 2. Methods added to ApplicationAttemptStateData should probably be explicitly marked Public-Unstable. {quote} Done {quote} 3. Annotate {{AggregateAppResourceUsage}} Private {quote} Annotated {quote} By the way, there was an offline discussion (documented on YARN-1530) about storing similar app-related metrics in the ATS. It would be nice for parties involved here to think about it and follow up on another JIRA. {quote} I am looking into this now. In the meantime, can you please let me know if this current patch resolves your concerns? Thank you, -Eric Payne Capture aggregate memory allocation at the app-level for chargeback --- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 2.5.0 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.201408150030.txt, YARN-415.201408181938.txt, YARN-415.201408181938.txt, YARN-415.201408212033.txt, YARN-415.201409040036.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2509) Enable Cross Origin Filter for timeline server only and not all Yarn servers
[ https://issues.apache.org/jira/browse/YARN-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2509: Attachment: YARN-2509.patch Updating the patch. [~jeagles], with the new {{modifiedInitializers}} flag, this new patch will work fine. Enable Cross Origin Filter for timeline server only and not all Yarn servers Key: YARN-2509 URL: https://issues.apache.org/jira/browse/YARN-2509 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2509.patch, YARN-2509.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2511) Allow All Origins by default when Cross Origin Filter is enabled
[ https://issues.apache.org/jira/browse/YARN-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121493#comment-14121493 ] Jonathan Eagles commented on YARN-2511: --- [~zjshen], Can you give a review? This just makes the default cross-origin pattern allow all origins (same as jetty 7 Cross origin filter) making us more compatible and giving the users a better default option. Allow All Origins by default when Cross Origin Filter is enabled Key: YARN-2511 URL: https://issues.apache.org/jira/browse/YARN-2511 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2511-v1.patch This is the default for jetty 7 cross origin filter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2509) Enable Cross Origin Filter for timeline server only and not all Yarn servers
[ https://issues.apache.org/jira/browse/YARN-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121512#comment-14121512 ] Hadoop QA commented on YARN-2509: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666497/YARN-2509.patch against trunk revision 8f1a668. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4824//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4824//console This message is automatically generated. Enable Cross Origin Filter for timeline server only and not all Yarn servers Key: YARN-2509 URL: https://issues.apache.org/jira/browse/YARN-2509 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2509.patch, YARN-2509.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121514#comment-14121514 ] Hadoop QA commented on YARN-415: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666493/YARN-415.201409040036.txt against trunk revision 8f1a668. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4823//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4823//console This message is automatically generated. Capture aggregate memory allocation at the app-level for chargeback --- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 2.5.0 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.201408150030.txt, YARN-415.201408181938.txt, YARN-415.201408181938.txt, YARN-415.201408212033.txt, YARN-415.201409040036.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2394) FairScheduler: Configure fairSharePreemptionThreshold per queue
[ https://issues.apache.org/jira/browse/YARN-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121541#comment-14121541 ] Hudson commented on YARN-2394: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1886 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1886/]) YARN-2394. FairScheduler: Configure fairSharePreemptionThreshold per queue. (Wei Yan via kasha) (kasha: rev 1dcaba9a7aa27f7ca4ba693e3abb56ab3c59c8a7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSLeafQueue.java FairScheduler: Configure fairSharePreemptionThreshold per queue --- Key: YARN-2394 URL: https://issues.apache.org/jira/browse/YARN-2394 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Fix For: 2.6.0 Attachments: YARN-2394-1.patch, YARN-2394-2.patch, YARN-2394-3.patch, YARN-2394-4.patch, YARN-2394-5.patch, YARN-2394-6.patch, YARN-2394-7.patch Preemption based on fair share starvation happens when usage of a queue is less than 50% of its fair share. This 50% is hardcoded. We'd like to make this configurable on a per queue basis, so that we can choose the threshold at which we want to preempt. Calling this config fairSharePreemptionThreshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2509) Enable Cross Origin Filter for timeline server only and not all Yarn servers
[ https://issues.apache.org/jira/browse/YARN-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121544#comment-14121544 ] Jonathan Eagles commented on YARN-2509: --- +1. Thanks, Mit. Committing this to trunk and branch-2 Enable Cross Origin Filter for timeline server only and not all Yarn servers Key: YARN-2509 URL: https://issues.apache.org/jira/browse/YARN-2509 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2509.patch, YARN-2509.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2512) Allow for origin pattern matching in cross origin filter
Jonathan Eagles created YARN-2512: - Summary: Allow for origin pattern matching in cross origin filter Key: YARN-2512 URL: https://issues.apache.org/jira/browse/YARN-2512 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2509) Enable Cross Origin Filter for timeline server only and not all Yarn servers
[ https://issues.apache.org/jira/browse/YARN-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2509: -- Fix Version/s: 2.6.0 Enable Cross Origin Filter for timeline server only and not all Yarn servers Key: YARN-2509 URL: https://issues.apache.org/jira/browse/YARN-2509 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Fix For: 2.6.0 Attachments: YARN-2509.patch, YARN-2509.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2513) Host framework UIs in YARN for use with the ATS
Jonathan Eagles created YARN-2513: - Summary: Host framework UIs in YARN for use with the ATS Key: YARN-2513 URL: https://issues.apache.org/jira/browse/YARN-2513 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Allow for pluggable UIs as described by TEZ-8. Yarn can provide the infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles reassigned YARN-2513: - Assignee: Jonathan Eagles Host framework UIs in YARN for use with the ATS --- Key: YARN-2513 URL: https://issues.apache.org/jira/browse/YARN-2513 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Allow for pluggable UIs as described by TEZ-8. Yarn can provide the infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2509) Enable Cross Origin Filter for timeline server only and not all Yarn servers
[ https://issues.apache.org/jira/browse/YARN-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121600#comment-14121600 ] Zhijie Shen commented on YARN-2509: --- What if CrossOriginFilterInitializer is configured in core-site.xml, but http-cross-origin.enabled = false? Enable Cross Origin Filter for timeline server only and not all Yarn servers Key: YARN-2509 URL: https://issues.apache.org/jira/browse/YARN-2509 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Fix For: 2.6.0 Attachments: YARN-2509.patch, YARN-2509.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121687#comment-14121687 ] Eric Payne commented on YARN-2056: -- [~leftnoteasy], have you had a chance to look at the hierarchical queue test that I added? I am grateful for your help. Thanks Eric Payne Disable preemption at Queue level - Key: YARN-2056 URL: https://issues.apache.org/jira/browse/YARN-2056 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal Assignee: Eric Payne Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, YARN-2056.201408310117.txt, YARN-2056.201409022208.txt We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2514) The elevated WSCE LRPC should grant access to the jon to the namenode
Remus Rusanu created YARN-2514: -- Summary: The elevated WSCE LRPC should grant access to the jon to the namenode Key: YARN-2514 URL: https://issues.apache.org/jira/browse/YARN-2514 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu the job created by wiutils task createAsUser must be accessible/controllable/killable by namenode or winutils task list/kill will fail later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)
[ https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121737#comment-14121737 ] Vinod Kumar Vavilapalli commented on YARN-1708: --- Tx for the updated patch, [~subru]! This looks so much better! A few minor comments - All the newInstance methods and setters in the response objects should be marked as private, for e.g in ReservationSubmissionResponse. Similarly in other objects too. We don't expect users to call them because responses are generated only by the platform. - ReservationId: It's likely that IDEs generate a better hashCode instead of us doing the long-to-int conversions? - ReservationRequests.{set|get}Type - {set|get}Interpretor? Similarly ReservationRequestsProto.type. - Rename ReservationRequest.leaseDuration to be simply duration inline with ReservationRequestProto.duration Add a public API to reserve resources (part of YARN-1051) - Key: YARN-1708 URL: https://issues.apache.org/jira/browse/YARN-1708 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1708.patch, YARN-1708.patch This JIRA tracks the definition of a new public API for YARN, which allows users to reserve resources (think of time-bounded queues). This is part of the admission control enhancement proposed in YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121778#comment-14121778 ] Zhijie Shen commented on YARN-2468: --- I'm afraid it may be not fair to compare the log file creation between a single long running service and a short-term application. I'm thinking about the file problem in a different direction. Let's see how many log files will be created for a YARN cluster. For example, a long running service takes 10% resource from the cluster, and runs for 10 days. On each day, it will spawn out 1 log file per day. On the other side, for example, a normal application also takes 10% resource from the cluster, runs for 1 days, and spawn out 1 log file. Suppose the application will be started every day. Over 10 days, the number of spawned logs of both the long running service and the 10 iterations of the application is 10. So from the point of view of the cluster, the number of logs is proportional to the resource usage instead of the application number. The similar resource usage may result in the similar number of log files. The case may not becoming even worse if we take the whole cluster into account. However, I agree we loose the opportunity to even make a long running service to use a single log file, reducing the total log file number. To completely resolve the too-many-files problem, we make think of timeline server, which has the store layer to deal with the real I/O on your behalf. Another optimization may be log retention, I'm not sure the feature already exists or have been proposed together in this solution. Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2448: Attachment: apache-yarn-2448.2.patch Uploaded a new patch to address the concerns raised by Karthik, Sandy and Vinod. Instead of exposing the resource calculator, expose the resources used by the scheduler instead. RM should expose the name of the ResourceCalculator being used when AMs register Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch, apache-yarn-2448.2.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2448) RM should expose the resource types considered during scheduling when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2448: Summary: RM should expose the resource types considered during scheduling when AMs register (was: RM should expose the name of the ResourceCalculator being used when AMs register) RM should expose the resource types considered during scheduling when AMs register -- Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch, apache-yarn-2448.2.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2448) RM should expose the resource types considered during scheduling when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121786#comment-14121786 ] Varun Vasudev commented on YARN-2448: - Updated title to reflect what the latest patch is fixing. RM should expose the resource types considered during scheduling when AMs register -- Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch, apache-yarn-2448.2.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2431) NM restart: cgroup is not removed for reacquired containers
[ https://issues.apache.org/jira/browse/YARN-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121796#comment-14121796 ] Thomas Graves commented on YARN-2431: - +1. Thanks Jason! Feel free to check it in. NM restart: cgroup is not removed for reacquired containers --- Key: YARN-2431 URL: https://issues.apache.org/jira/browse/YARN-2431 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2431.patch The cgroup for a reacquired container is not being removed when the container exits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2391) Windows Secure Container Executor helper service should assign launched process to the NM job
[ https://issues.apache.org/jira/browse/YARN-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2391: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-2198 Windows Secure Container Executor helper service should assign launched process to the NM job - Key: YARN-2391 URL: https://issues.apache.org/jira/browse/YARN-2391 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Critical Labels: security, windows The YARN-2198 NM helper service needs to make sure the launched process is added to the NM job ('job' as in Windows NT job objects, not Hadoop jobs). This ensures that NM termination ensures launched process termination. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2448) RM should expose the resource types considered during scheduling when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121927#comment-14121927 ] Hadoop QA commented on YARN-2448: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666543/apache-yarn-2448.2.patch against trunk revision b44b2ee. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4826//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4826//console This message is automatically generated. RM should expose the resource types considered during scheduling when AMs register -- Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch, apache-yarn-2448.2.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2431) NM restart: cgroup is not removed for reacquired containers
[ https://issues.apache.org/jira/browse/YARN-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121980#comment-14121980 ] Jason Lowe commented on YARN-2431: -- Thanks for the reviews, Nathan and Tom! Committing this. NM restart: cgroup is not removed for reacquired containers --- Key: YARN-2431 URL: https://issues.apache.org/jira/browse/YARN-2431 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2431.patch The cgroup for a reacquired container is not being removed when the container exits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1712) Admission Control: plan follower
[ https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1712: --- Attachment: YARN-1712.2.patch [~leftnoteasy] , good to hear that you got the full context. Thanks for reviewing the patch. I am uploading a new patch that has the following changes: * Fix the Log message. * Replace stale references to sessions with reservations, good catch. The currentReservations might have new reservations which just start now so were not active before. These will not yet have corresponding reservation queues in CapacityScheduler as we create them after sorting. This is done to ensure the what you highlighted earlier - we never exceed total capacity. Admission Control: plan follower Key: YARN-1712 URL: https://issues.apache.org/jira/browse/YARN-1712 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations, scheduler Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.patch This JIRA tracks a thread that continuously propagates the current state of an inventory subsystem to the scheduler. As the inventory subsystem store the plan of how the resources should be subdivided, the work we propose in this JIRA realizes such plan by dynamically instructing the CapacityScheduler to add/remove/resize queues to follow the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1712) Admission Control: plan follower
[ https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122007#comment-14122007 ] Hadoop QA commented on YARN-1712: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666581/YARN-1712.2.patch against trunk revision b44b2ee. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4827//console This message is automatically generated. Admission Control: plan follower Key: YARN-1712 URL: https://issues.apache.org/jira/browse/YARN-1712 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations, scheduler Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.patch This JIRA tracks a thread that continuously propagates the current state of an inventory subsystem to the scheduler. As the inventory subsystem store the plan of how the resources should be subdivided, the work we propose in this JIRA realizes such plan by dynamically instructing the CapacityScheduler to add/remove/resize queues to follow the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM
[ https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122011#comment-14122011 ] Zhijie Shen commented on YARN-611: -- Almost good, just one minor thing: 1. You may want to mark this method \@Stable because the setter/getter is marked \@Stable. {code} + @Unstable + public static ApplicationSubmissionContext newInstance( {code} Add an AM retry count reset window to YARN RM - Key: YARN-611 URL: https://issues.apache.org/jira/browse/YARN-611 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Chris Riccomini Assignee: Xuan Gong Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch, YARN-611.6.patch, YARN-611.7.patch YARN currently has the following config: yarn.resourcemanager.am.max-retries This config defaults to 2, and defines how many times to retry a failed AM before failing the whole YARN job. YARN counts an AM as failed if the node that it was running on dies (the NM will timeout, which counts as a failure for the AM), or if the AM dies. This configuration is insufficient for long running (or infinitely running) YARN jobs, since the machine (or NM) that the AM is running on will eventually need to be restarted (or the machine/NM will fail). In such an event, the AM has not done anything wrong, but this is counted as a failure by the RM. Since the retry count for the AM is never reset, eventually, at some point, the number of machine/NM failures will result in the AM failure count going above the configured value for yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the job as failed, and shut it down. This behavior is not ideal. I propose that we add a second configuration: yarn.resourcemanager.am.retry-count-window-ms This configuration would define a window of time that would define when an AM is well behaved, and it's safe to reset its failure count back to zero. Every time an AM fails the RmAppImpl would check the last time that the AM failed. If the last failure was less than retry-count-window-ms ago, and the new failure count is max-retries, then the job should fail. If the AM has never failed, the retry count is max-retries, or if the last failure was OUTSIDE the retry-count-window-ms, then the job should be restarted. Additionally, if the last failure was outside the retry-count-window-ms, then the failure count should be set back to 0. This would give developers a way to have well-behaved AMs run forever, while still failing mis-behaving AMs after a short period of time. I think the work to be done here is to change the RmAppImpl to actually look at app.attempts, and see if there have been more than max-retries failures in the last retry-count-window-ms milliseconds. If there have, then the job should fail, if not, then the job should go forward. Additionally, we might also need to add an endTime in either RMAppAttemptImpl or RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the failure. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2511) Allow All Origins by default when Cross Origin Filter is enabled
[ https://issues.apache.org/jira/browse/YARN-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122036#comment-14122036 ] Zhijie Shen commented on YARN-2511: --- +1, will commit the patch. Allow All Origins by default when Cross Origin Filter is enabled Key: YARN-2511 URL: https://issues.apache.org/jira/browse/YARN-2511 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2511-v1.patch This is the default for jetty 7 cross origin filter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-611) Add an AM retry count reset window to YARN RM
[ https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-611: --- Attachment: YARN-611.8.patch Add an AM retry count reset window to YARN RM - Key: YARN-611 URL: https://issues.apache.org/jira/browse/YARN-611 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Chris Riccomini Assignee: Xuan Gong Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch, YARN-611.6.patch, YARN-611.7.patch, YARN-611.8.patch YARN currently has the following config: yarn.resourcemanager.am.max-retries This config defaults to 2, and defines how many times to retry a failed AM before failing the whole YARN job. YARN counts an AM as failed if the node that it was running on dies (the NM will timeout, which counts as a failure for the AM), or if the AM dies. This configuration is insufficient for long running (or infinitely running) YARN jobs, since the machine (or NM) that the AM is running on will eventually need to be restarted (or the machine/NM will fail). In such an event, the AM has not done anything wrong, but this is counted as a failure by the RM. Since the retry count for the AM is never reset, eventually, at some point, the number of machine/NM failures will result in the AM failure count going above the configured value for yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the job as failed, and shut it down. This behavior is not ideal. I propose that we add a second configuration: yarn.resourcemanager.am.retry-count-window-ms This configuration would define a window of time that would define when an AM is well behaved, and it's safe to reset its failure count back to zero. Every time an AM fails the RmAppImpl would check the last time that the AM failed. If the last failure was less than retry-count-window-ms ago, and the new failure count is max-retries, then the job should fail. If the AM has never failed, the retry count is max-retries, or if the last failure was OUTSIDE the retry-count-window-ms, then the job should be restarted. Additionally, if the last failure was outside the retry-count-window-ms, then the failure count should be set back to 0. This would give developers a way to have well-behaved AMs run forever, while still failing mis-behaving AMs after a short period of time. I think the work to be done here is to change the RmAppImpl to actually look at app.attempts, and see if there have been more than max-retries failures in the last retry-count-window-ms milliseconds. If there have, then the job should fail, if not, then the job should go forward. Additionally, we might also need to add an endTime in either RMAppAttemptImpl or RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the failure. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM
[ https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122047#comment-14122047 ] Xuan Gong commented on YARN-611: Thanks for the review. Uploaded a new patch to address the latest comment. [~vinodkv] Do you have any other comments ? Add an AM retry count reset window to YARN RM - Key: YARN-611 URL: https://issues.apache.org/jira/browse/YARN-611 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Chris Riccomini Assignee: Xuan Gong Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch, YARN-611.6.patch, YARN-611.7.patch, YARN-611.8.patch YARN currently has the following config: yarn.resourcemanager.am.max-retries This config defaults to 2, and defines how many times to retry a failed AM before failing the whole YARN job. YARN counts an AM as failed if the node that it was running on dies (the NM will timeout, which counts as a failure for the AM), or if the AM dies. This configuration is insufficient for long running (or infinitely running) YARN jobs, since the machine (or NM) that the AM is running on will eventually need to be restarted (or the machine/NM will fail). In such an event, the AM has not done anything wrong, but this is counted as a failure by the RM. Since the retry count for the AM is never reset, eventually, at some point, the number of machine/NM failures will result in the AM failure count going above the configured value for yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the job as failed, and shut it down. This behavior is not ideal. I propose that we add a second configuration: yarn.resourcemanager.am.retry-count-window-ms This configuration would define a window of time that would define when an AM is well behaved, and it's safe to reset its failure count back to zero. Every time an AM fails the RmAppImpl would check the last time that the AM failed. If the last failure was less than retry-count-window-ms ago, and the new failure count is max-retries, then the job should fail. If the AM has never failed, the retry count is max-retries, or if the last failure was OUTSIDE the retry-count-window-ms, then the job should be restarted. Additionally, if the last failure was outside the retry-count-window-ms, then the failure count should be set back to 0. This would give developers a way to have well-behaved AMs run forever, while still failing mis-behaving AMs after a short period of time. I think the work to be done here is to change the RmAppImpl to actually look at app.attempts, and see if there have been more than max-retries failures in the last retry-count-window-ms milliseconds. If there have, then the job should fail, if not, then the job should go forward. Additionally, we might also need to add an endTime in either RMAppAttemptImpl or RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the failure. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122073#comment-14122073 ] Tsuyoshi OZAWA commented on YARN-1514: -- Confirmed that v5 patch can be applied to trunk. Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122078#comment-14122078 ] Jian He commented on YARN-1198: --- Craig, thanks for working on the issue. Took a look at the patch. Does it make sense to decouple headRoom calculation from user limit calculation? specifically, we may calculate the headRoom when the AM actually calls getHeadRoom. This should make sure that the headRoom is always up-to-date when AM gets the headRoom. Also, we may not need to loop all the users in assignContainers if doing this. Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Craig Welch Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1707: --- Attachment: YARN-1707.9.patch Uploading a new patch with a minor change. Renamed ReservationQueue#changeCapacity to ReservationQueue#setEntitlement for consistency. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)
[ https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1708: --- Attachment: YARN-1708.patch Thanks [~vinodkv] for reviewing the patch. I am uploading a new patch that has the following fixes based on your comments: * All the newInstance methods and setters in the Reservation*Response objects should be marked as private. * Replaced hashCode with IDE generated one in ReservationId * Renamed ReservationRequests.{set|get}Type - {set|get}Interpretor, also in ReservationRequestsProto.type. * Renamed ReservationRequest.leaseDuration to be simply duration to make it consistent with ReservationRequestProto.duration Add a public API to reserve resources (part of YARN-1051) - Key: YARN-1708 URL: https://issues.apache.org/jira/browse/YARN-1708 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch This JIRA tracks the definition of a new public API for YARN, which allows users to reserve resources (think of time-bounded queues). This is part of the admission control enhancement proposed in YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122127#comment-14122127 ] Hadoop QA commented on YARN-1707: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666598/YARN-1707.9.patch against trunk revision 51a4faf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4829//console This message is automatically generated. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)
[ https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122128#comment-14122128 ] Hadoop QA commented on YARN-1708: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1201/YARN-1708.patch against trunk revision 51a4faf. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4830//console This message is automatically generated. Add a public API to reserve resources (part of YARN-1051) - Key: YARN-1708 URL: https://issues.apache.org/jira/browse/YARN-1708 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch This JIRA tracks the definition of a new public API for YARN, which allows users to reserve resources (think of time-bounded queues). This is part of the admission control enhancement proposed in YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM
[ https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122140#comment-14122140 ] Hadoop QA commented on YARN-611: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666595/YARN-611.8.patch against trunk revision 3fa5f72. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4828//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4828//console This message is automatically generated. Add an AM retry count reset window to YARN RM - Key: YARN-611 URL: https://issues.apache.org/jira/browse/YARN-611 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Chris Riccomini Assignee: Xuan Gong Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch, YARN-611.6.patch, YARN-611.7.patch, YARN-611.8.patch YARN currently has the following config: yarn.resourcemanager.am.max-retries This config defaults to 2, and defines how many times to retry a failed AM before failing the whole YARN job. YARN counts an AM as failed if the node that it was running on dies (the NM will timeout, which counts as a failure for the AM), or if the AM dies. This configuration is insufficient for long running (or infinitely running) YARN jobs, since the machine (or NM) that the AM is running on will eventually need to be restarted (or the machine/NM will fail). In such an event, the AM has not done anything wrong, but this is counted as a failure by the RM. Since the retry count for the AM is never reset, eventually, at some point, the number of machine/NM failures will result in the AM failure count going above the configured value for yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the job as failed, and shut it down. This behavior is not ideal. I propose that we add a second configuration: yarn.resourcemanager.am.retry-count-window-ms This configuration would define a window of time that would define when an AM is well behaved, and it's safe to reset its failure count back to zero. Every time an AM fails the RmAppImpl would check the last time that the AM failed. If the last failure was less than retry-count-window-ms ago, and the new failure count is max-retries, then the job should fail. If the AM has never failed, the retry count is max-retries, or if the last failure was OUTSIDE the retry-count-window-ms, then the job should be restarted. Additionally, if the last failure was outside the retry-count-window-ms, then the failure count should be set back to 0. This would give developers a way to have well-behaved AMs run forever, while still failing mis-behaving AMs after a short period of time. I think the work to be done here is to change the RmAppImpl to actually look at app.attempts, and see if there have been more than max-retries failures in the last retry-count-window-ms milliseconds. If there have, then the job should fail, if not, then the job should go forward. Additionally, we might also need to add an endTime in either RMAppAttemptImpl or RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the failure. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122185#comment-14122185 ] Jian He commented on YARN-1707: --- +1 for the latest patch, thanks [~subru] and [~curino] ! Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122188#comment-14122188 ] Craig Welch commented on YARN-1198: --- [~jianhe], have a look at patch 7, it takes that sort of approach. Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Craig Welch Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122234#comment-14122234 ] Subramaniam Krishnan commented on YARN-1707: Thanks [~jianhe] and [~leftnoteasy] for taking the time to do a thorough review. I am proxying for [~curino] also as he did most of the work for the patch. As discussed we will commit this to YARN-1051 branch once we have +1s for few other sub-JIRAs. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-1492: --- Attachment: YARN-1492-all-trunk-v5.patch Attached v5 to address final license and findbug issues. truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2515) Update ConverterUtils#toContainerId to parse epoch
Tsuyoshi OZAWA created YARN-2515: Summary: Update ConverterUtils#toContainerId to parse epoch Key: YARN-2515 URL: https://issues.apache.org/jira/browse/YARN-2515 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA ContaienrId#toString was updated on YARN-2182. We should also update ConverterUtils#toContainerId to parse epoch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2515) Update ConverterUtils#toContainerId to parse epoch
[ https://issues.apache.org/jira/browse/YARN-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2515: - Attachment: YARN-2515.1.patch Updated to parse epoch if it exists. Update ConverterUtils#toContainerId to parse epoch -- Key: YARN-2515 URL: https://issues.apache.org/jira/browse/YARN-2515 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2515.1.patch ContaienrId#toString was updated on YARN-2182. We should also update ConverterUtils#toContainerId to parse epoch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2515) Update ConverterUtils#toContainerId to parse epoch
[ https://issues.apache.org/jira/browse/YARN-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122365#comment-14122365 ] Hadoop QA commented on YARN-2515: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1251/YARN-2515.1.patch against trunk revision 6104520. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4833//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4833//console This message is automatically generated. Update ConverterUtils#toContainerId to parse epoch -- Key: YARN-2515 URL: https://issues.apache.org/jira/browse/YARN-2515 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2515.1.patch ContaienrId#toString was updated on YARN-2182. We should also update ConverterUtils#toContainerId to parse epoch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122373#comment-14122373 ] Hadoop QA commented on YARN-1492: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1231/YARN-1492-all-trunk-v5.patch against trunk revision f7df24b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4832//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4832//console This message is automatically generated. truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122375#comment-14122375 ] Hadoop QA commented on YARN-1492: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1231/YARN-1492-all-trunk-v5.patch against trunk revision f7df24b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4831//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4831//console This message is automatically generated. truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2509) Enable Cross Origin Filter for timeline server only and not all Yarn servers
[ https://issues.apache.org/jira/browse/YARN-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122379#comment-14122379 ] Jonathan Eagles commented on YARN-2509: --- [~zjshen], I think this will come down to documenting this feature properly as part of YARN-2507. The current behavior is that CORS support will be added if the user does that. Let me know if you think it can be improved. Enable Cross Origin Filter for timeline server only and not all Yarn servers Key: YARN-2509 URL: https://issues.apache.org/jira/browse/YARN-2509 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Fix For: 2.6.0 Attachments: YARN-2509.patch, YARN-2509.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2460) Remove obsolete entries from yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-2460: - Attachment: YARN-2460-01.patch Contains the following modifications: 1) Removing the following properties: yarn.ipc.serializer.type yarn.ipc.exception.factory.class yarn.resourcemanager.amliveliness-monitor.interval-ms yarn.resourcemanager.nm.liveness-monitor.interval-ms yarn.nodemanager.resourcemanager.connect.wait.secs yarn.nodemanager.resourcemanager.connect.retry_interval.secs 2) Renamed yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs to yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-secs Remove obsolete entries from yarn-default.xml - Key: YARN-2460 URL: https://issues.apache.org/jira/browse/YARN-2460 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: newbie Attachments: YARN-2460-01.patch The following properties are defined in yarn-default.xml, but do not exist in YarnConfiguration. mapreduce.job.hdfs-servers mapreduce.job.jar yarn.ipc.exception.factory.class yarn.ipc.serializer.type yarn.nodemanager.aux-services.mapreduce_shuffle.class yarn.nodemanager.hostname yarn.nodemanager.resourcemanager.connect.retry_interval.secs yarn.nodemanager.resourcemanager.connect.wait.secs yarn.resourcemanager.amliveliness-monitor.interval-ms yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs yarn.resourcemanager.container.liveness-monitor.interval-ms yarn.resourcemanager.nm.liveness-monitor.interval-ms yarn.timeline-service.hostname yarn.timeline-service.http-authentication.simple.anonymous.allowed yarn.timeline-service.http-authentication.type Presumably, the mapreduce.* properties are okay. Similarly, the yarn.timeline-service.* properties are for the future TimelineService. However, the rest are likely fully deprecated. Submitting bug for comment/feedback about which other properties should be kept in yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122458#comment-14122458 ] Hadoop QA commented on YARN-2460: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1265/YARN-2460-01.patch against trunk revision 6104520. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4834//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4834//console This message is automatically generated. Remove obsolete entries from yarn-default.xml - Key: YARN-2460 URL: https://issues.apache.org/jira/browse/YARN-2460 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: newbie Attachments: YARN-2460-01.patch The following properties are defined in yarn-default.xml, but do not exist in YarnConfiguration. mapreduce.job.hdfs-servers mapreduce.job.jar yarn.ipc.exception.factory.class yarn.ipc.serializer.type yarn.nodemanager.aux-services.mapreduce_shuffle.class yarn.nodemanager.hostname yarn.nodemanager.resourcemanager.connect.retry_interval.secs yarn.nodemanager.resourcemanager.connect.wait.secs yarn.resourcemanager.amliveliness-monitor.interval-ms yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs yarn.resourcemanager.container.liveness-monitor.interval-ms yarn.resourcemanager.nm.liveness-monitor.interval-ms yarn.timeline-service.hostname yarn.timeline-service.http-authentication.simple.anonymous.allowed yarn.timeline-service.http-authentication.type Presumably, the mapreduce.* properties are okay. Similarly, the yarn.timeline-service.* properties are for the future TimelineService. However, the rest are likely fully deprecated. Submitting bug for comment/feedback about which other properties should be kept in yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)