[jira] [Assigned] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-2246: Assignee: Devaraj K (was: Jason Lowe) bq. Do you suggest that we should use the original tracking url directly instead of proxy url on the web UI? Not sure if it's always OK to use the raw history tracking URL, as there might be some setups where the client can reach the RM but can't reach the history tracking URL directly. However I think it's OK to always have the RM advertise the original proxy tracking URL (i.e.: http://rmaddr/proxy/appid) to clients through its UI. The AM can redefine what that proxy redirects to, but the RM should never tack on paths to that proxy URI when advertising it. In other words, clients visiting http://rmaddr/proxyappid will always reach the UI (either AM or history), and if the AM and history have properly mirrored UIs then it should be seamless to transition between the two when the AM unregisters and redefines the tracking URL to point to the history server. bq. seamless view between the AM UI and history UI is not possible nowadays. Correct, but that's MapReduce's fault and not YARN's. If the RM handles the proxy properly then it should be possible for an app framework to implement a properly mirrored UI between the AM and the history server. bq. In general, seamless view will still be difficult with the aforementioned solution between two tracking URLs. For example, tracking URL is http://t1:p1/a/b first, and I'm visiting the path at http://t1:p1/a/b/x/y/z. When the tracking URL becomes http://t2:p2/c/d/e, I refresh the package and am redirected http://t2:p2/c/d/e/a/b/x/y/z. Without mapping between original tracking url and proxy url, we don't know /a/b is part of tracking url base, and it shouldn't be carried on. Not sure I'm following the example because there's no proxy URLs in it. The client should always be using the proxy URL for this discussion. If I follow the example correctly, the original tracking URL is http://t1:p1/a/b, and the proxy URL is rooted there (i.e.: proxy/appid - t1:p1/a/b). So I'm visiting proxy/appid/x/y/z and then the AM unregisters with a new tracking URL of t2:p2/c/d/e. Then the proxy servlet should redirect that same proxy/appid/x/y/z request to t2:p2/c/d/e/x/y/z which seems correct to me. It's just taking the path underneath the proxy address (i.e.: everything after proxy/appid) and tacking it on the specified tracking URL. The same subpath is seen by both the AM and history URIs, assuming a/b is the root of the AM UI and c/d/e is the root of the history UI (for that app). So it seems this works as I would expect. Am I missing something? bq. I have updated the generateProxyUriWithScheme() in the latest patch. Thanks for updating the patch, Devaraj. I think it looks good, although it would be nice to have some regression tests to verify that if the app changes the tracking URL that the proxy URL doesn't update like it used to. Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314236#comment-14314236 ] Hadoop QA commented on YARN-2616: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697704/YARN-2616-008.patch against trunk revision e0ec071. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6574//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6574//console This message is automatically generated. Add CLI client to the registry to list/view entries --- Key: YARN-2616 URL: https://issues.apache.org/jira/browse/YARN-2616 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Akshay Radia Attachments: YARN-2616-003.patch, YARN-2616-008.patch, YARN-2616-008.patch, yarn-2616-v1.patch, yarn-2616-v2.patch, yarn-2616-v4.patch, yarn-2616-v5.patch, yarn-2616-v6.patch, yarn-2616-v7.patch registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3090: --- Attachment: YARN-3090.04.patch DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3090.001.patch, YARN-3090.002.patch, YARN-3090.003.patch, YARN-3090.04.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3090: --- Attachment: (was: YARN-3090.004.patch) DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3090.001.patch, YARN-3090.002.patch, YARN-3090.003.patch, YARN-3090.04.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314344#comment-14314344 ] Varun Saxena commented on YARN-3090: Able to kick it. DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3090.001.patch, YARN-3090.002.patch, YARN-3090.003.patch, YARN-3090.04.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314328#comment-14314328 ] Varun Saxena commented on YARN-3090: Weird. Jenkins is not getting kicked. Can somebody do that manually ? DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3090.001.patch, YARN-3090.002.patch, YARN-3090.003.patch, YARN-3090.04.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314385#comment-14314385 ] Hadoop QA commented on YARN-3090: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697794/YARN-3090.04.patch against trunk revision e0ec071. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6576//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6576//console This message is automatically generated. DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3090.001.patch, YARN-3090.002.patch, YARN-3090.003.patch, YARN-3090.04.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-2246: Assignee: Jason Lowe (was: Devaraj K) Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Jason Lowe Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3170) YARN architecture document needs updating
Allen Wittenauer created YARN-3170: -- Summary: YARN architecture document needs updating Key: YARN-3170 URL: https://issues.apache.org/jira/browse/YARN-3170 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer The marketing paragraph at the top, NextGen MapReduce, etc are all marketing rather than actual descriptions. It also needs some general updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3170: --- Component/s: documentation YARN architecture document needs updating - Key: YARN-3170 URL: https://issues.apache.org/jira/browse/YARN-3170 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer The marketing paragraph at the top, NextGen MapReduce, etc are all marketing rather than actual descriptions. It also needs some general updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2942) Aggregated Log Files should be compacted
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2942: Attachment: YARN-2942.003.patch The YARN-2942.003.patch fixes some minor problems I found when dealing with logs for long running applications: - The JHS would correctly display the logs, but also show a message that they couldn't be found - The NM wasn't trying to compact the long running logs (which is expected), but it was dumping an ugly error message to it's log about it. It now checks that the normal aggregated log file exists before trying to read it to prevent that. I also made it so that it won't even try to get the lock if it's aggregated file is not there, which is better. Aggregated Log Files should be compacted Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CompactedAggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, YARN-2942.003.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3164) rmadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315443#comment-14315443 ] Bibin A Chundatt commented on YARN-3164: Findbug and Test failure seems not related to this commit only console message gets updated with patch uploaded rmadmin command usage prints incorrect command name --- Key: YARN-3164 URL: https://issues.apache.org/jira/browse/YARN-3164 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: YARN-3164.1.patch /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color} transitionToActive: incorrect number of arguments Usage:{color:red} HAAdmin {color} [-transitionToActive serviceId [--forceactive]] {color:red} ./yarn HAAdmin {color} Error: Could not find or load main class HAAdmin Expected it should be rmadmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label
[ https://issues.apache.org/jira/browse/YARN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315554#comment-14315554 ] Hadoop QA commented on YARN-3124: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697933/YARN-3124.3.patch against trunk revision 7c6b654. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6587//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6587//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6587//console This message is automatically generated. Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label Key: YARN-3124 URL: https://issues.apache.org/jira/browse/YARN-3124 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3124.1.patch, YARN-3124.2.patch, YARN-3124.3.patch After YARN-3098, capacities-by-label (include used-capacity/maximum-capacity/absolute-maximum-capacity, etc.) should be tracked in QueueCapacities. This patch is targeting to make capacities-by-label in CS Queues are all tracked by QueueCapacities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3157) Wrong format for application id / attempt id not handled completely
[ https://issues.apache.org/jira/browse/YARN-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3157: --- Attachment: YARN-3157.1.patch uploading after applying formatter Wrong format for application id / attempt id not handled completely --- Key: YARN-3157 URL: https://issues.apache.org/jira/browse/YARN-3157 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: YARN-3157.1.patch, YARN-3157.patch, YARN-3157.patch yarn.cmd application -kill application_123 Format wrong given for application id or attempt. Exception will be thrown to console with out any info {quote} 15/02/07 22:18:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:146) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:205) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.killApplication(ApplicationCLI.java:383) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:219) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) {quote} Need to add catch block for java.util.NoSuchElementException also -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315420#comment-14315420 ] Akira AJISAKA commented on YARN-2336: - Hi [~kj-ki], would you rebase the patch for trunk? Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be compacted
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315405#comment-14315405 ] Hadoop QA commented on YARN-2942: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697901/YARN-2942.002.patch against trunk revision d5855c0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 2.0.3) to fail. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6586//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6586//console This message is automatically generated. Aggregated Log Files should be compacted Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CompactedAggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, YARN-2942.003.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3164) rmadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315473#comment-14315473 ] Rohith commented on YARN-3164: -- [~bibinchundatt] thank for providing patch. Could you add test for regression? rmadmin command usage prints incorrect command name --- Key: YARN-3164 URL: https://issues.apache.org/jira/browse/YARN-3164 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: YARN-3164.1.patch /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color} transitionToActive: incorrect number of arguments Usage:{color:red} HAAdmin {color} [-transitionToActive serviceId [--forceactive]] {color:red} ./yarn HAAdmin {color} Error: Could not find or load main class HAAdmin Expected it should be rmadmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3160) Non-atomic operation on nodeUpdateQueue in RMNodeImpl
[ https://issues.apache.org/jira/browse/YARN-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315476#comment-14315476 ] Chengbing Liu commented on YARN-3160: - Maybe just {{updatedContainers}}? Renaming is fine to me. Non-atomic operation on nodeUpdateQueue in RMNodeImpl - Key: YARN-3160 URL: https://issues.apache.org/jira/browse/YARN-3160 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3160.2.patch, YARN-3160.patch {code:title=RMNodeImpl.java|borderStyle=solid} while(nodeUpdateQueue.peek() != null){ latestContainerInfoList.add(nodeUpdateQueue.poll()); } {code} The above code brings potential risk of adding null value to {{latestContainerInfoList}}. Since {{ConcurrentLinkedQueue}} implements a wait-free algorithm, we can directly poll the queue, before checking whether the value is null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3169) drop the useless yarn overview document
Allen Wittenauer created YARN-3169: -- Summary: drop the useless yarn overview document Key: YARN-3169 URL: https://issues.apache.org/jira/browse/YARN-3169 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer It's pretty superfluous given there is a site index on the left. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3151) On Failover tracking url wrong in application cli for KILLED application
[ https://issues.apache.org/jira/browse/YARN-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315548#comment-14315548 ] Xuan Gong commented on YARN-3151: - Patch looks good to me. [~rohithsharma] Could you check whether the test cases are related or not ? On Failover tracking url wrong in application cli for KILLED application Key: YARN-3151 URL: https://issues.apache.org/jira/browse/YARN-3151 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Affects Versions: 2.6.0 Environment: 2 RM HA Reporter: Bibin A Chundatt Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3151.patch Run an application and kill the same after starting Check {color:red} ./yarn application -list -appStates KILLED {color} (empty line) {quote} Application-Id Tracking-URL application_1423219262738_0001 http://IP:PORT/cluster/app/application_1423219262738_0001 {quote} Shutdown the active RM1 Check the same command {color:red} ./yarn application -list -appStates KILLED {color} after RM2 is active {quote} Application-Id Tracking-URL application_1423219262738_0001 null {quote} Tracking url for application is shown as null Expected : Same url before failover should be shown ApplicationReport .getOriginalTrackingUrl() is null after failover org.apache.hadoop.yarn.client.cli.ApplicationCLI listApplications(SetString appTypes, EnumSetYarnApplicationState appStates) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3151) On Failover tracking url wrong in application cli for KILLED application
[ https://issues.apache.org/jira/browse/YARN-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315559#comment-14315559 ] Rohith commented on YARN-3151: -- Thanks [~xgong] for review.. I will check and upload the patch soon On Failover tracking url wrong in application cli for KILLED application Key: YARN-3151 URL: https://issues.apache.org/jira/browse/YARN-3151 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Affects Versions: 2.6.0 Environment: 2 RM HA Reporter: Bibin A Chundatt Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3151.patch Run an application and kill the same after starting Check {color:red} ./yarn application -list -appStates KILLED {color} (empty line) {quote} Application-Id Tracking-URL application_1423219262738_0001 http://IP:PORT/cluster/app/application_1423219262738_0001 {quote} Shutdown the active RM1 Check the same command {color:red} ./yarn application -list -appStates KILLED {color} after RM2 is active {quote} Application-Id Tracking-URL application_1423219262738_0001 null {quote} Tracking url for application is shown as null Expected : Same url before failover should be shown ApplicationReport .getOriginalTrackingUrl() is null after failover org.apache.hadoop.yarn.client.cli.ApplicationCLI listApplications(SetString appTypes, EnumSetYarnApplicationState appStates) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1237) Description for yarn.nodemanager.aux-services in yarn-default.xml is misleading
[ https://issues.apache.org/jira/browse/YARN-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314484#comment-14314484 ] Brahma Reddy Battula commented on YARN-1237: can we update like comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers..? Description for yarn.nodemanager.aux-services in yarn-default.xml is misleading --- Key: YARN-1237 URL: https://issues.apache.org/jira/browse/YARN-1237 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Hitesh Shah Priority: Minor Description states: the valid service name should only contain a-zA-Z0-9_ and can not start with numbers It seems to indicate only one service is supported. If multiple services are allowed, it does not indicate how they should be specified i.e. comma-separated or space-separated? If the service name cannot contain spaces, does this imply that space-separated lists are also permitted? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3110) Faulty link and state in ApplicationHistory when aplication is in unassigned state
[ https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314533#comment-14314533 ] Hadoop QA commented on YARN-3110: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697521/YARN-3110.20150209-1.patch against trunk revision 4eb5f7f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6579//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6579//console This message is automatically generated. Faulty link and state in ApplicationHistory when aplication is in unassigned state -- Key: YARN-3110 URL: https://issues.apache.org/jira/browse/YARN-3110 Project: Hadoop YARN Issue Type: Bug Components: applications, timelineserver Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-3110.20150209-1.patch Application state and History link wrong when Application is in unassigned state 1.Configure capacity schedular with queue size as 1 also max Absolute Max Capacity: 10.0% (Current application state is Accepted and Unassigned from resource manager side) 2.Submit application to queue and check the state and link in Application history State= null and History link shown as N/A in applicationhistory page Kill the same application . In timeline server logs the below is show when selecting application link. {quote} 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to read the AM container of the application attempt appattempt_1422467063659_0007_01. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160) at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38) at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
[jira] [Commented] (YARN-3129) [YARN] Daemon log 'set level' and 'get level' is not reflecting in Process logs
[ https://issues.apache.org/jira/browse/YARN-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314443#comment-14314443 ] Naganarasimha G R commented on YARN-3129: - Hi [~jagadesh.kiran] [~brahmareddy] I feel this is not a issue, as the usage of the command is to pass the Log name for example /yarn daemonlog -setlevel xx.xx.xx.xxx:45020 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl DEBUG After running the above command you can run a yarn app and check the RM logs to find the debug logs for org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl. If its working then will close this issue. [YARN] Daemon log 'set level' and 'get level' is not reflecting in Process logs Key: YARN-3129 URL: https://issues.apache.org/jira/browse/YARN-3129 Project: Hadoop YARN Issue Type: Bug Reporter: Jagadesh Kiran N Assignee: Naganarasimha G R a. Execute the command ./yarn daemonlog -setlevel xx.xx.xx.xxx:45020 ResourceManager DEBUG b. It is not reflecting in process logs even after performing client level operations c. Log level is not changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3164) rmadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3164: --- Attachment: YARN-3164.1.patch Patch added for the same. Please review the same. rmadmin command usage prints incorrect command name --- Key: YARN-3164 URL: https://issues.apache.org/jira/browse/YARN-3164 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: YARN-3164.1.patch /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color} transitionToActive: incorrect number of arguments Usage:{color:red} HAAdmin {color} [-transitionToActive serviceId [--forceactive]] {color:red} ./yarn HAAdmin {color} Error: Could not find or load main class HAAdmin Expected it should be rmadmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314470#comment-14314470 ] Hudson commented on YARN-3090: -- FAILURE: Integrated in Hadoop-trunk-Commit #7062 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7062/]) YARN-3090. DeletionService can silently ignore deletion task failures. Contributed by Varun Saxena (jlowe: rev 4eb5f7fa32bab1b9ce3fb58eca51e2cd2e194cd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java * hadoop-yarn-project/CHANGES.txt DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3090.001.patch, YARN-3090.002.patch, YARN-3090.003.patch, YARN-3090.04.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3160) Non-atomic operation on nodeUpdateQueue in RMNodeImpl
[ https://issues.apache.org/jira/browse/YARN-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314482#comment-14314482 ] Junping Du commented on YARN-3160: -- Didn't see these failures in testReport. Kick off Jenkins test again. Non-atomic operation on nodeUpdateQueue in RMNodeImpl - Key: YARN-3160 URL: https://issues.apache.org/jira/browse/YARN-3160 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3160.2.patch, YARN-3160.patch {code:title=RMNodeImpl.java|borderStyle=solid} while(nodeUpdateQueue.peek() != null){ latestContainerInfoList.add(nodeUpdateQueue.poll()); } {code} The above code brings potential risk of adding null value to {{latestContainerInfoList}}. Since {{ConcurrentLinkedQueue}} implements a wait-free algorithm, we can directly poll the queue, before checking whether the value is null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1237) Description for yarn.nodemanager.aux-services in yarn-default.xml is misleading
[ https://issues.apache.org/jira/browse/YARN-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314507#comment-14314507 ] Tsuyoshi OZAWA commented on YARN-1237: -- Hi [~brahmareddy] , thank you for taking this JIRA. {quote} comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers {quote} Sounds reasonable. From my actual configuration: {code} property nameyarn.nodemanager.aux-services/name valuespark_shuffle,mapreduce_shuffle/value /property {code} Description for yarn.nodemanager.aux-services in yarn-default.xml is misleading --- Key: YARN-1237 URL: https://issues.apache.org/jira/browse/YARN-1237 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Hitesh Shah Priority: Minor Description states: the valid service name should only contain a-zA-Z0-9_ and can not start with numbers It seems to indicate only one service is supported. If multiple services are allowed, it does not indicate how they should be specified i.e. comma-separated or space-separated? If the service name cannot contain spaces, does this imply that space-separated lists are also permitted? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314445#comment-14314445 ] Jason Lowe commented on YARN-3090: -- +1 lgtm. Committing this. DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3090.001.patch, YARN-3090.002.patch, YARN-3090.003.patch, YARN-3090.04.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3129) [YARN] Daemon log 'set level' and 'get level' is not reflecting in Process logs
[ https://issues.apache.org/jira/browse/YARN-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314477#comment-14314477 ] Naganarasimha G R commented on YARN-3129: - Or as part of this jira we can do the following : # update the usage as {quote} Usage: General options are: [-getlevel host:httpPort log name] [-setlevel host:httpPort log name level] {quote} # update the documentation with an example of using this command # {{level}} param currently takes only case sensitive but i think we should support case insensitive value too [YARN] Daemon log 'set level' and 'get level' is not reflecting in Process logs Key: YARN-3129 URL: https://issues.apache.org/jira/browse/YARN-3129 Project: Hadoop YARN Issue Type: Bug Reporter: Jagadesh Kiran N Assignee: Naganarasimha G R a. Execute the command ./yarn daemonlog -setlevel xx.xx.xx.xxx:45020 ResourceManager DEBUG b. It is not reflecting in process logs even after performing client level operations c. Log level is not changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1580) Documentation error regarding container-allocation.expiry-interval-ms
[ https://issues.apache.org/jira/browse/YARN-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned YARN-1580: -- Assignee: Brahma Reddy Battula Documentation error regarding container-allocation.expiry-interval-ms --- Key: YARN-1580 URL: https://issues.apache.org/jira/browse/YARN-1580 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.2.0 Environment: CentOS 6.4 Reporter: German Florez-Larrahondo Assignee: Brahma Reddy Battula Priority: Trivial While trying to control settings related to expiration of tokens for long running jobs,based on the documentation ( http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml) I attempted to increase values for yarn.rm.container-allocation.expiry-interval-ms without luck. Looking code like YarnConfiguration.java I noticed that in recent versions all these kind of settings now have the prefix yarn.resourcemanager.rm as opposed to yarn.rm. So for this specific case the setting of interest is yarn.resourcemanager.rm.container-allocation.expiry-interval-ms I supposed there are other documentation errors similar to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314500#comment-14314500 ] Hudson commented on YARN-2809: -- FAILURE: Integrated in Hadoop-trunk-Commit #7063 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7063/]) YARN-2809. Implement workaround for linux kernel panic when removing cgroup. Contributed by Nathan Roberts (jlowe: rev 3f5431a22fcef7e3eb9aceeefe324e5b7ac84049) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt Implement workaround for linux kernel panic when removing cgroup Key: YARN-2809 URL: https://issues.apache.org/jira/browse/YARN-2809 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: RHEL 6.4 Reporter: Nathan Roberts Assignee: Nathan Roberts Fix For: 2.7.0 Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster it can result in a couple of panics per day. This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-2246: Attachment: YARN-2246-4.patch Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3164) rmadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314544#comment-14314544 ] Hadoop QA commented on YARN-3164: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697812/YARN-3164.1.patch against trunk revision 4eb5f7f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.ipc.TestRPCWaitForProxy org.apache.hadoop.yarn.client.api.impl.TestAMRMClient Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6577//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6577//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6577//console This message is automatically generated. rmadmin command usage prints incorrect command name --- Key: YARN-3164 URL: https://issues.apache.org/jira/browse/YARN-3164 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: YARN-3164.1.patch /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color} transitionToActive: incorrect number of arguments Usage:{color:red} HAAdmin {color} [-transitionToActive serviceId [--forceactive]] {color:red} ./yarn HAAdmin {color} Error: Could not find or load main class HAAdmin Expected it should be rmadmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313834#comment-14313834 ] Hadoop QA commented on YARN-933: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697671/0004-YARN-933.patch against trunk revision b73956f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6573//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6573//console This message is automatically generated. Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING - Key: YARN-933 URL: https://issues.apache.org/jira/browse/YARN-933 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Rohith Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313843#comment-14313843 ] Chris Douglas commented on YARN-3100: - Sorry, I didn't get to the patch over the weekend. Thanks for addressing the review feedback. Are there JIRAs following some of the types to be added to PrivilegedEntity? Just curious. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3100.1.patch, YARN-3100.2.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313831#comment-14313831 ] Chris Douglas commented on YARN-1983: - bq. We still need a way to demux the executor to support the case of YARN cluster with a mix of executors. That'd mean some impact on the CLC, no? Policies that select the appropriate executor could demux on the contents of the CLC and not a dedicated field. A simple, static dispatch from an admin-configured list is a great place to start, but adding a string to the CLC that selects the executor class by name is difficult to evolve. Since the same semantics are available without changes to the platform, why bake these in? bq. I think my current patch is intrusive indeed but more general, right? I'm not sure I follow. How is it more general? Support heterogeneous container types at runtime on YARN Key: YARN-1983 URL: https://issues.apache.org/jira/browse/YARN-1983 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du Attachments: YARN-1983.2.patch, YARN-1983.patch Different container types (default, LXC, docker, VM box, etc.) have different semantics on isolation of security, namespace/env, performance, etc. Per discussions in YARN-1964, we have some good thoughts on supporting different types of containers running on YARN and specified by application at runtime which largely enhance YARN's flexibility to meet heterogenous app's requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-2246: Attachment: YARN-2246-3.patch I have updated the generateProxyUriWithScheme() in the latest patch. Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313926#comment-14313926 ] Devaraj K commented on YARN-2246: - I agree that this needs to be handled in RMAttemptImpl before exposing the proxy URL to the users. Thanks for your patch [~zjshen]. I have tried this patch, it works fine. I think generateProxyUriWithScheme() can be updated according the patch changes. Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2971) RM uses conf instead of token service address to renew timeline delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313980#comment-14313980 ] Hudson commented on YARN-2971: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #100 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/100/]) YARN-2971. RM uses conf instead of token service address to renew timeline delegation tokens (jeagles) (jeagles: rev af0842589359ad800427337ad2c84fac09907f72) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java RM uses conf instead of token service address to renew timeline delegation tokens - Key: YARN-2971 URL: https://issues.apache.org/jira/browse/YARN-2971 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Fix For: 2.7.0 Attachments: YARN-2971-v1.patch, YARN-2971-v2.patch The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to renew Timeline DelegationTokens. It should read the service address out of the token to renew the delegation token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313977#comment-14313977 ] Hudson commented on YARN-3100: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #100 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/100/]) YARN-3100. Made YARN authorization pluggable. Contributed by Jian He. (zjshen: rev 23bf6c72071782e3fd5a628e21495d6b974c7a9e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AccessType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AdminACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SCMAdminProtocolService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/PrivilegedEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ConfiguredYarnAuthorizer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/YarnAuthorizationProvider.java Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3100.1.patch, YARN-3100.2.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery
[ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313979#comment-14313979 ] Hudson commented on YARN-3094: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #100 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/100/]) YARN-3094. Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java * hadoop-yarn-project/CHANGES.txt reset timer for liveness monitors after RM recovery --- Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.7.0 Attachments: YARN-3094.2.patch, YARN-3094.3.patch, YARN-3094.4.patch, YARN-3094.5.patch, YARN-3094.patch When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3155) Refactor the exception handling code for TimelineClientImpl's retryOn method
[ https://issues.apache.org/jira/browse/YARN-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313992#comment-14313992 ] Hudson commented on YARN-3155: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #100 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/100/]) YARN-3155. Refactor the exception handling code for TimelineClientImpl's retryOn method (Li Lu via wangda) (wangda: rev 00a748d24a565bce0cc8cfa2bdcf165778cea395) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-yarn-project/CHANGES.txt Refactor the exception handling code for TimelineClientImpl's retryOn method Key: YARN-3155 URL: https://issues.apache.org/jira/browse/YARN-3155 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Minor Labels: refactoring Fix For: 2.7.0 Attachments: YARN-3155-020615.patch, YARN-3155-020915.patch Since we switched to Java 1.7, the exception handling code for the retryOn method can be merged into one statement block, instead of the current two, to avoid repeated code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2616: - Attachment: YARN-2616-008.patch Patch -008. uploading to see if this triggers jenkins Add CLI client to the registry to list/view entries --- Key: YARN-2616 URL: https://issues.apache.org/jira/browse/YARN-2616 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Akshay Radia Attachments: YARN-2616-003.patch, YARN-2616-008.patch, YARN-2616-008.patch, yarn-2616-v1.patch, yarn-2616-v2.patch, yarn-2616-v4.patch, yarn-2616-v5.patch, yarn-2616-v6.patch, yarn-2616-v7.patch registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314036#comment-14314036 ] Hudson commented on YARN-3100: -- FAILURE: Integrated in Hadoop-Yarn-trunk #834 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/834/]) YARN-3100. Made YARN authorization pluggable. Contributed by Jian He. (zjshen: rev 23bf6c72071782e3fd5a628e21495d6b974c7a9e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ConfiguredYarnAuthorizer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/PrivilegedEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/RMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/YarnAuthorizationProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SCMAdminProtocolService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AccessType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AdminACLsManager.java Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Fix For: 2.7.0 Attachments: YARN-3100.1.patch, YARN-3100.2.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2971) RM uses conf instead of token service address to renew timeline delegation tokens
[ https://issues.apache.org/jira/browse/YARN-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314039#comment-14314039 ] Hudson commented on YARN-2971: -- FAILURE: Integrated in Hadoop-Yarn-trunk #834 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/834/]) YARN-2971. RM uses conf instead of token service address to renew timeline delegation tokens (jeagles) (jeagles: rev af0842589359ad800427337ad2c84fac09907f72) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * hadoop-yarn-project/CHANGES.txt RM uses conf instead of token service address to renew timeline delegation tokens - Key: YARN-2971 URL: https://issues.apache.org/jira/browse/YARN-2971 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Fix For: 2.7.0 Attachments: YARN-2971-v1.patch, YARN-2971-v2.patch The TimelineClientImpl renewDelegationToken uses the incorrect webaddress to renew Timeline DelegationTokens. It should read the service address out of the token to renew the delegation token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery
[ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314038#comment-14314038 ] Hudson commented on YARN-3094: -- FAILURE: Integrated in Hadoop-Yarn-trunk #834 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/834/]) YARN-3094. Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java reset timer for liveness monitors after RM recovery --- Key: YARN-3094 URL: https://issues.apache.org/jira/browse/YARN-3094 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.7.0 Attachments: YARN-3094.2.patch, YARN-3094.3.patch, YARN-3094.4.patch, YARN-3094.5.patch, YARN-3094.patch When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). In our system, we found the recover process took about 3 mins, and all AM time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3155) Refactor the exception handling code for TimelineClientImpl's retryOn method
[ https://issues.apache.org/jira/browse/YARN-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314052#comment-14314052 ] Hudson commented on YARN-3155: -- FAILURE: Integrated in Hadoop-Yarn-trunk #834 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/834/]) YARN-3155. Refactor the exception handling code for TimelineClientImpl's retryOn method (Li Lu via wangda) (wangda: rev 00a748d24a565bce0cc8cfa2bdcf165778cea395) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-yarn-project/CHANGES.txt Refactor the exception handling code for TimelineClientImpl's retryOn method Key: YARN-3155 URL: https://issues.apache.org/jira/browse/YARN-3155 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Minor Labels: refactoring Fix For: 2.7.0 Attachments: YARN-3155-020615.patch, YARN-3155-020915.patch Since we switched to Java 1.7, the exception handling code for the retryOn method can be merged into one statement block, instead of the current two, to avoid repeated code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3164) rmadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314167#comment-14314167 ] Bibin A Chundatt commented on YARN-3164: Any problem in changing the message as below {color:red} Usage: rmadmin {color} rmadmin command usage prints incorrect command name --- Key: YARN-3164 URL: https://issues.apache.org/jira/browse/YARN-3164 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color} transitionToActive: incorrect number of arguments Usage:{color:red} HAAdmin {color} [-transitionToActive serviceId [--forceactive]] {color:red} ./yarn HAAdmin {color} Error: Could not find or load main class HAAdmin Expected it should be rmadmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314636#comment-14314636 ] Robert Kanter commented on YARN-2423: - This is based on the current implementation. We can try to add a compatibility layer or something in another JIRA. Though I'm not sure how feasible that will be; the data models are somewhat different... TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage priority labels
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: 0005-YARN-2693.patch Attaching Priority Manager patch with updated changes as discussed in parent JIRA Priority Label Manager in RM to manage priority labels -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * ACL support in queue level for priority label * Expose interface to RM to validate priority label Storage for this labels will be done in FileSystem and in Memory similar to NodeLabel * FileSystem Based : persistent across RM restart * Memory Based: non-persistent across RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314653#comment-14314653 ] Junping Du commented on YARN-914: - Thanks [~vinodkv] for comments! bq. IAC, I think we should also have a CLI command to decommission the node which optionally waits till the decommission succeeds. That sounds pretty good. This new CLI can simply gracefully decommission related nodes and wait to timeout to forcefully decommission nodes haven't finished. Comparing with approach of external script proposed by Ming above, this has less dependency on effort that outside of hadoop. bq. Regarding storage of the decommission state, YARN-2567 also plans to make sure that the state of all nodes is maintained up to date on the state-store. That helps with many other cases too. We should combine these efforts. That make sense. However, YARN-2567 is about threshold thing, may be a wrong JIRA number? bq. Regarding long running services, I think it makes sense to let the admin initiating the decommission know - not in terms of policy but as a diagnostic. Other than waiting for a timeout, the admin may not have noticed that a service is running on this node before the decommission is triggered. bq. This is the umbrella concern I have. There are two ways to do this: Let YARN manage the decommission process or manage it on top of YARN. If the later is the approach, I don't see a lot to be done here besides YARN-291. No? Agree that there is less effort for 2nd approach. If so, we still need RM can aware containers/apps get finished then trigger shutdown to NM to make decommission comes earlier (and randomly) which I guess is important to upgrade of large cluster. Isn't it? For YARN-291, my understanding is now we don't rely on any open issues left there because we only need to set NM's resource to 0 at runtime which we already provide there. BTW, I think the approach you just proposed above is 2nd approach + a new CLI. Isn't it? I prefer to go with this way but would like to hear other guys' ideas here also. Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du Attachments: Gracefully Decommission of NodeManager (v1).pdf When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314707#comment-14314707 ] Hadoop QA commented on YARN-2246: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697819/YARN-2246-4.patch against trunk revision 3f5431a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6580//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6580//console This message is automatically generated. Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
[ https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3074: --- Attachment: YARN-3074.03.patch Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3074.001.patch, YARN-3074.002.patch, YARN-3074.03.patch When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
[ https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3074: --- Attachment: (was: YARN-3074.003.patch) Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3074.001.patch, YARN-3074.002.patch, YARN-3074.03.patch When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3163) admin support for YarnAuthorizationProvider
[ https://issues.apache.org/jira/browse/YARN-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314648#comment-14314648 ] Jian He commented on YARN-3163: --- [~sunilg], I have one question that if acl is changed in both config file and other storage, after RM restart, how can the RM figure out which one should take precedence ? admin support for YarnAuthorizationProvider --- Key: YARN-3163 URL: https://issues.apache.org/jira/browse/YARN-3163 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Runtime configuration support for YarnAuthorizationProvider. Using admin commands, one should be able to set and get permission from the YarnAuthorizationProvider. This mechanism will help users without updating config files and firing reload commands. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage priority labels
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314712#comment-14314712 ] Hadoop QA commented on YARN-2693: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697839/0005-YARN-2693.patch against trunk revision 3f5431a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6583//console This message is automatically generated. Priority Label Manager in RM to manage priority labels -- Key: YARN-2693 URL: https://issues.apache.org/jira/browse/YARN-2693 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch Focus of this JIRA is to have a centralized service to handle priority labels. Support operations such as * Add/Delete priority label to a specified queue * Manage integer mapping associated with each priority label * Support managing default priority label of a given queue * ACL support in queue level for priority label * Expose interface to RM to validate priority label Storage for this labels will be done in FileSystem and in Memory similar to NodeLabel * FileSystem Based : persistent across RM restart * Memory Based: non-persistent across RM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2423: Attachment: (was: YARN-2423.007.patch) TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2423: Attachment: YARN-2423.007.patch I'm not sure what Jenkin's problem is. I've re-rebased the 007 patch and am trying again. TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3157) Wrong format for application id / attempt id not handled completely
[ https://issues.apache.org/jira/browse/YARN-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314718#comment-14314718 ] Hadoop QA commented on YARN-3157: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697492/YARN-3157.patch against trunk revision 3f5431a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6581//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6581//console This message is automatically generated. Wrong format for application id / attempt id not handled completely --- Key: YARN-3157 URL: https://issues.apache.org/jira/browse/YARN-3157 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: YARN-3157.patch, YARN-3157.patch yarn.cmd application -kill application_123 Format wrong given for application id or attempt. Exception will be thrown to console with out any info {quote} 15/02/07 22:18:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:146) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:205) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.killApplication(ApplicationCLI.java:383) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:219) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) {quote} Need to add catch block for java.util.NoSuchElementException also -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3129) [YARN] Daemon log 'set level' and 'get level' is not reflecting in Process logs
[ https://issues.apache.org/jira/browse/YARN-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314658#comment-14314658 ] Allen Wittenauer commented on YARN-3129: bq. we should support case insensitive value too These levels are defined by log4j and defined as uppercase everywhere in both code and config. Making it mixed case here means supporting mixed case everywhere... But otherwise, yes, I agree this sounds like a documentation issue more than a bug. I'll move this to HADOOP. [YARN] Daemon log 'set level' and 'get level' is not reflecting in Process logs Key: YARN-3129 URL: https://issues.apache.org/jira/browse/YARN-3129 Project: Hadoop YARN Issue Type: Bug Reporter: Jagadesh Kiran N Assignee: Naganarasimha G R a. Execute the command ./yarn daemonlog -setlevel xx.xx.xx.xxx:45020 ResourceManager DEBUG b. It is not reflecting in process logs even after performing client level operations c. Log level is not changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3160) Non-atomic operation on nodeUpdateQueue in RMNodeImpl
[ https://issues.apache.org/jira/browse/YARN-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314631#comment-14314631 ] Hadoop QA commented on YARN-3160: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697665/YARN-3160.2.patch against trunk revision 4eb5f7f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6578//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6578//console This message is automatically generated. Non-atomic operation on nodeUpdateQueue in RMNodeImpl - Key: YARN-3160 URL: https://issues.apache.org/jira/browse/YARN-3160 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-3160.2.patch, YARN-3160.patch {code:title=RMNodeImpl.java|borderStyle=solid} while(nodeUpdateQueue.peek() != null){ latestContainerInfoList.add(nodeUpdateQueue.poll()); } {code} The above code brings potential risk of adding null value to {{latestContainerInfoList}}. Since {{ConcurrentLinkedQueue}} implements a wait-free algorithm, we can directly poll the queue, before checking whether the value is null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314677#comment-14314677 ] Jason Lowe commented on YARN-914: - bq. However, YARN-2567 is about threshold thing, may be a wrong JIRA number? That's the right JIRA. It's about waiting for a threshold number of nodes to report back in after the RM recovers, and the RM would need to persist the state about the nodes in the cluster to know what percentage of the old nodes have reported back in. As for whether we should just provide hooks vs. making it much more of a turnkey solution, I'd be an advocate for initially seeing what we can do with hooks. Based on what we learn with trying to do decommission with that we can provide feedback into the process of making it a built-in, turnkey solution later. I do agree with Vinod that there should minimally be an easy way, CLI or otherwise, for outside scripts driving the decommission to either force it or wait for it to complete. If waiting, there also needs to be a way to either have the wait have a timeout which will force after that point or another method with which to easily kill the containers still on that node. Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du Attachments: Gracefully Decommission of NodeManager (v1).pdf When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314728#comment-14314728 ] Devaraj K commented on YARN-2246: - {code:xml} org.apache.hadoop.yarn.server.resourcemanager.TestRM.testNMTokenSentForNormalContainer[1] Failing for the past 1 build (Since Failed#6580 ) Took 20 sec. Error Message test timed out after 2 milliseconds {code} This test failure is unrelated to the patch. It passes in my local. Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314743#comment-14314743 ] Hadoop QA commented on YARN-2423: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697840/YARN-2423.007.patch against trunk revision 3f5431a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6582//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6582//console This message is automatically generated. TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314305#comment-14314305 ] Devaraj K commented on YARN-2246: - Thanks [~jlowe] for looking into the patch and confirming the approach. I will update the patch with the tests. Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314898#comment-14314898 ] Rushabh S Shah commented on YARN-2902: -- [~varun_saxena]: are you still working on this jira ? Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2902.002.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314931#comment-14314931 ] Hitesh Shah commented on YARN-2928: --- bq. We should have such a configuration that disables the timeline service globally. Please explain what globally means. bq. Can it be handled as a flow of flows as described in the design? For instance, tez application -- hive queries -- YARN apps? Or does it not capture the relationship? Not sure I understand clearly as to how the relationship is captured. Consider this case: There are 5 hive queries: q1 to q5. There are 3 Tez apps: a1 to a3. Now, q1 and q5 ran on a1, q2 ran on a2 and q3,q4 ran on a3. Given q1, I need to know which app it ran on. Given a1, I need to know which queries ran on it. Could you clarify how this should be represented as flows? Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label
[ https://issues.apache.org/jira/browse/YARN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314971#comment-14314971 ] Jian He commented on YARN-3124: --- - Merge CapacitySchedulerConfiguration#setCapacitiesByLabels and CSQueueUtils#setAbsoluteCapacitiesByNodeLabels into a single method - CapacitySchedulerConfiguration#normalizeAccessibleNodeLabels - should AbstractCSQueue#accessibleLabels be updated as well ? - why union? newCapacities.getExistingNodeLabels is enough ? {code} for (String label : Sets.union(this.getExistingNodeLabels(), newCapacities.getExistingNodeLabels())) { {code} - Can the existing get*CapacityByLabel can be removed? use queueCapacities#get*capacity instead - null for the queueCapacity ? then we can remove the parameter {code} setupQueueConfigs(cs.getClusterResource(), userLimit, userLimitFactor, maxApplications, maxAMResourcePerQueuePercent, maxApplicationsPerUser, state, acls, cs.getConfiguration().getNodeLocalityDelay(), accessibleLabels, defaultLabelExpression, cs.getConfiguration() .getReservationContinueLook(), null, cs.getConfiguration() .getMaximumAllocationPerQueue(getQueuePath())); {code} - remove this? {code} @Override protected void initializeCapacitiesFromConf() { // Do nothing } {code} - {{CSQueueUtils.setAbsoluteCapacitiesByNodeLabel}} may be inside AbstractCSQueue - QueueCapacities#getExistingNodeLabels - getNodeLabels? - why {{CSQueueUtils.setAbsoluteCapacitiesByNodeLabels(queueCapacities, parent);}} has to be called in ReservationQueue#reinitialize Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label Key: YARN-3124 URL: https://issues.apache.org/jira/browse/YARN-3124 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3124.1.patch, YARN-3124.2.patch After YARN-3098, capacities-by-label (include used-capacity/maximum-capacity/absolute-maximum-capacity, etc.) should be tracked in QueueCapacities. This patch is targeting to make capacities-by-label in CS Queues are all tracked by QueueCapacities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2683) registry config options: document and move to core-default
[ https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314914#comment-14314914 ] Sanjay Radia commented on YARN-2683: yarn-registry.md * This document describes a YARN service registry built to address a problem: change to address two problems: * add: ** Allow Hadoop core services to be registered and discovered thereby reducing configuration parameters and to allow core services to be more easily moved. registry config options: document and move to core-default -- Key: YARN-2683 URL: https://issues.apache.org/jira/browse/YARN-2683 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-10530-005.patch, YARN-2683-001.patch, YARN-2683-002.patch, YARN-2683-003.patch, YARN-2683-006.patch Original Estimate: 1h Time Spent: 1h Remaining Estimate: 0.5h Add to {{yarn-site}} a page on registry configuration parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3165) Possible inconsistent queue state when queue reinitialization failed
Jian He created YARN-3165: - Summary: Possible inconsistent queue state when queue reinitialization failed Key: YARN-3165 URL: https://issues.apache.org/jira/browse/YARN-3165 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He This came up in a discussion with [~chris.douglas]. If queue reinitialization failed in the middle, it is possible that queues are left in an inconsistent state - some queues are already updated, but some are not. One example is below code in leafQueue: {code} if (newMax.getMemory() oldMax.getMemory() || newMax.getVirtualCores() oldMax.getVirtualCores()) { throw new IOException( Trying to reinitialize + getQueuePath() + the maximum allocation size can not be decreased! + Current setting: + oldMax + , trying to set it to: + newMax); } {code} If exception is thrown here, the previous queues are already updated, but latter queues are not. So we should make queue reinitialization transactional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314933#comment-14314933 ] Hitesh Shah commented on YARN-2928: --- Also, [~sjlee0] [~zjshen] I am assuming you are already aware of YARN-2423 and plan to maintain compatibility with that implementation if that is introduced in a version earlier to the one in which this next-gen impl is supported? Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314997#comment-14314997 ] Vinod Kumar Vavilapalli commented on YARN-1621: --- Thanks for working on this Bartosz. Quick comments on the patch: - listcontainers - list-containers - Add a negative test for pre-running applications Overall, the CLI is pretty badly organized, and this patch is making it worse. We have - applicationattempt -list applicationId: Lists appattempts of an app - container -list attemptId: Lists containers of an attempt - application -list: List all apps I don't like this, but it is what we have. For this patch, we can continue this scheme and put a container -list appattemptid. And may be a create different set of commands which make the listing work backwards in a separate effort. Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Assignee: Bartosz Ługowski Fix For: 2.7.0 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3020) n similar addContainerRequest()s produce n*(n+1)/2 containers
[ https://issues.apache.org/jira/browse/YARN-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315013#comment-14315013 ] Peter D Kirchner commented on YARN-3020: Hi Wei Yan, My point, adjusted to take the expected usage into account, is that when matching requests and/or allocations are spread over multiple heartbeats, too many containers are requested and received. So, suppose my application calls addContainerRequest() 10 times. Let's take your example where the AMRMClient sends 1 container request on heartbeat 1, and 10 requests at heartbeat 2, overwriting the 1. Say also that the second RPC returns with 1 container. The second request is high by one, i.e. 10, because the application does not yet know about the incoming allocation. Subsequent updates are also high by approximately the number of incoming containers. My application heartbeat is 1 second and the RM is typically allocating 1 container/node/second so I'd expect 10 containers coming in on the third heartbeat. Per expected usage, my AMRMClient would have sent out an updated request for 9 containers at that time. My application would zero-out the matching request on the fourth heartbeat and release the nine extra containers (90% more) that it received that it never intended to request. In the present implementation, with the AMRMClient keeping track of the totals, removeContainerRequest() properly decrements AMRMClient's idea of the outstanding count. But due to this information being a heartbeat out of date vs. the scheduler's, (pending a definitive fix) a partial fix would be that the AMRMClient should not routinely update the RM with this matching total, whenever the scheduler's tally is likely to be more accurate. Occasions when the RM should be updated are when there is a new matching addContainerRequest(), i.e. the scheduler's target could otherwise be too low, or when the AMRMClient's outstanding count is decremented to zero. Please see my response to Wangda Tan 30 Jan 2015. Thank you. n similar addContainerRequest()s produce n*(n+1)/2 containers - Key: YARN-3020 URL: https://issues.apache.org/jira/browse/YARN-3020 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2 Reporter: Peter D Kirchner Original Estimate: 24h Remaining Estimate: 24h BUG: If the application master calls addContainerRequest() n times, but with the same priority, I get up to 1+2+3+...+n containers = n*(n+1)/2 . The most containers are requested when the interval between calls to addContainerRequest() exceeds the heartbeat interval of calls to allocate() (in AMRMClientImpl's run() method). If the application master calls addContainerRequest() n times, but with a unique priority each time, I get n containers (as I intended). Analysis: There is a logic problem in AMRMClientImpl.java. Although AMRMClientImpl.java, allocate() does an ask.clear() , on subsequent calls to addContainerRequest(), addResourceRequest() finds the previous matching remoteRequest and increments the container count rather than starting anew, and does an addResourceRequestToAsk() which defeats the ask.clear(). From documentation and code comments, it was hard for me to discern the intended behavior of the API, but the inconsistency reported in this issue suggests one case or the other is implemented incorrectly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315018#comment-14315018 ] Jonathan Eagles commented on YARN-2246: --- I think this is going to fix my issue. Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3166) Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned YARN-3166: --- Assignee: Li Lu Decide detailed package structures for timeline service v2 components - Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314952#comment-14314952 ] Hitesh Shah commented on YARN-2423: --- bq. This is based on the current implementation. We can try to add a compatibility layer or something in another JIRA. Though I'm not sure how feasible that will be; the data models are somewhat different. If the current implementation is not planned to be supported in the long term, why introduce a java API that will soon be deprecated or rendered obsolete if the data models are different? Or is the only intention to backport this feature/API into 2.4, 2.5 and 2.6 for existing users of the current implementation of ATS? TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3166) Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3166: Issue Type: Sub-task (was: Task) Parent: YARN-2928 Decide detailed package structures for timeline service v2 components - Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only so I don't think it should have any assignees. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3166) Decide detailed package structures for timeline service v2 components
Li Lu created YARN-3166: --- Summary: Decide detailed package structures for timeline service v2 components Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Task Reporter: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only so I don't think it should have any assignees. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3166) Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3166: Description: Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. was: Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only so I don't think it should have any assignees. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. Decide detailed package structures for timeline service v2 components - Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314744#comment-14314744 ] Bartosz Ługowski commented on YARN-1621: Patch update. Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Fix For: 2.7.0 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3041) create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314752#comment-14314752 ] Sangjin Lee commented on YARN-3041: --- Hitesh on YARN-2928 brought up an interesting point regarding the events (also see my reply). For my own education, what is an event in current ATS? Is it explicitly about affecting state changes in entities? Or can it be something else? How should events be defined in the next gen timeline service? And/or should the notion of the state be explicitly defined? Thoughts? create the ATS entity/event API --- Key: YARN-3041 URL: https://issues.apache.org/jira/browse/YARN-3041 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: YARN-3041.preliminary.001.patch Per design in YARN-2928, create the ATS entity and events API. Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bartosz Ługowski updated YARN-1621: --- Attachment: YARN-1621.3.patch Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Fix For: 2.7.0 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bartosz Ługowski updated YARN-1621: --- Attachment: (was: YARN-1621.3.patch) Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Fix For: 2.7.0 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314748#comment-14314748 ] Sangjin Lee commented on YARN-2928: --- bq. How is a workflow defined when an entity has 2 parents? Considering the tez-hive example, do you agree that both a Hive Query and a Tez application are workflows and share some entities? Can it be handled as a flow of flows as described in the design? For instance, tez application -- hive queries -- YARN apps? Or does it not capture the relationship? Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314763#comment-14314763 ] Sangjin Lee commented on YARN-3034: --- Thanks [~Naganarasimha]! I'll go over the patch today... bq. Whether we require Multithreaded Dispatcher as we are not publishing container life cycle events and if normal dispatcher is ok whether to use rmcontext.getDispatcher ? For publishing app lifecycle events only, I suspect a normal dispatcher might be OK. However, there could be more use cases in the future. If it is not too complicated, using a multi-threaded dispatcher might be bit preferable IMO. Thoughts? bq. AppAttempt needs to be Entity or event of ApplicationEntity ? i feel later option is better How is it today with the current ATS? If the same container can be part of different app attempts (e.g. successive AMs managing the same set of containers), then app attempts can't be separate entities? [~zjshen]? implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
[ https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314833#comment-14314833 ] Varun Saxena commented on YARN-3074: [~jlowe], kindly review Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3074.001.patch, YARN-3074.002.patch, YARN-3074.03.patch When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314797#comment-14314797 ] Sangjin Lee commented on YARN-3034: --- Some feedback on the patch... (1) this creates a dependency from RM to the timeline service; perhaps it is unavoidable... (2) RMTimelineAggregator.java - we need the license - annotate with @Private and @Unstable - line. 31: nit; spacing (3) SystemMetricsPublisher.java - instead of replacing the use of the existing ATS, I think we need to have both (the existing ATS calls as well as the new calls); we will need a global config that enables/disables the next gen timeline service implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
[ https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314830#comment-14314830 ] Hadoop QA commented on YARN-3074: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697846/YARN-3074.03.patch against trunk revision 3f5431a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6584//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6584//console This message is automatically generated. Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3074.001.patch, YARN-3074.002.patch, YARN-3074.03.patch When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314857#comment-14314857 ] Hadoop QA commented on YARN-1621: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697849/YARN-1621.3.patch against trunk revision 3f5431a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6585//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6585//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6585//console This message is automatically generated. Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Fix For: 2.7.0 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314892#comment-14314892 ] Wangda Tan commented on YARN-1621: -- Assigned to [~noddi]. Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Assignee: Bartosz Ługowski Fix For: 2.7.0 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1621: - Assignee: Bartosz Ługowski Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Assignee: Bartosz Ługowski Fix For: 2.7.0 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1237) Description for yarn.nodemanager.aux-services in yarn-default.xml is misleading
[ https://issues.apache.org/jira/browse/YARN-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned YARN-1237: -- Assignee: Brahma Reddy Battula Description for yarn.nodemanager.aux-services in yarn-default.xml is misleading --- Key: YARN-1237 URL: https://issues.apache.org/jira/browse/YARN-1237 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Hitesh Shah Assignee: Brahma Reddy Battula Priority: Minor Description states: the valid service name should only contain a-zA-Z0-9_ and can not start with numbers It seems to indicate only one service is supported. If multiple services are allowed, it does not indicate how they should be specified i.e. comma-separated or space-separated? If the service name cannot contain spaces, does this imply that space-separated lists are also permitted? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3169) drop the useless yarn overview document
[ https://issues.apache.org/jira/browse/YARN-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned YARN-3169: -- Assignee: Brahma Reddy Battula drop the useless yarn overview document --- Key: YARN-3169 URL: https://issues.apache.org/jira/browse/YARN-3169 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Brahma Reddy Battula It's pretty superfluous given there is a site index on the left. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315665#comment-14315665 ] Zhijie Shen commented on YARN-2928: --- bq. I am assuming you are already aware of YARN-2423 and plan to maintain compatibility The data models of current and next gen TS are likely to be different. To be compatible to old data model, we probably need to change the existing timeline client to covert the old entity to the new one. bq. We should have such a configuration that disables the timeline service globally. I think it's also good to have per-app flag. If the app is configured not to use the timeline service, we don't need to start the per-app aggregator. bq. My point related to events was not about a new interesting feature but to generally understand what use case is meant to be solved by events and how should an application developer use events? I thought you mean using publisher/subscriber architecture, such as Kafka, to consume the incoming event streams. Other than that, IMHO, we still need to support the existing query of getting the stored events of a set of some entities. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3171) Sort by application id don't work in ATS web ui
Jeff Zhang created YARN-3171: Summary: Sort by application id don't work in ATS web ui Key: YARN-3171 URL: https://issues.apache.org/jira/browse/YARN-3171 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Jeff Zhang The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3171) Sort by application id doesn't work in ATS web ui
[ https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3171: - Summary: Sort by application id doesn't work in ATS web ui (was: Sort by application id don't work in ATS web ui) Sort by application id doesn't work in ATS web ui - Key: YARN-3171 URL: https://issues.apache.org/jira/browse/YARN-3171 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Jeff Zhang The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3171) Sort by application id doesn't work in ATS web ui
[ https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3171: - Priority: Minor (was: Major) Sort by application id doesn't work in ATS web ui - Key: YARN-3171 URL: https://issues.apache.org/jira/browse/YARN-3171 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Jeff Zhang Assignee: Naganarasimha G R Priority: Minor Attachments: ats_webui.png The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3171) Sort by application id doesn't work in ATS web ui
[ https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315676#comment-14315676 ] Naganarasimha G R commented on YARN-3171: - Hi [~jeffzhang], I wish to work on this jira and hence have assigned it, if you want to work on it or already have the patch, feel free to re assign. Sort by application id doesn't work in ATS web ui - Key: YARN-3171 URL: https://issues.apache.org/jira/browse/YARN-3171 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Jeff Zhang Assignee: Naganarasimha G R Priority: Minor Attachments: ats_webui.png The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315598#comment-14315598 ] Gururaj Shetty commented on YARN-3168: -- I would like to take up this task. Kindly assign it to me. Convert site documentation from apt to markdown --- Key: YARN-3168 URL: https://issues.apache.org/jira/browse/YARN-3168 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Allen Wittenauer YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)