[jira] [Commented] (YARN-2428) LCE default banned user list should have yarn
[ https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298757#comment-14298757 ] Hudson commented on YARN-2428: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2040 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2040/]) YARN-2428. LCE default banned user list should have yarn (Varun Saxena via aw) (aw: rev 9dd0b7a2ab6538d8f72b004eb97c2750ff3d98dd) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c LCE default banned user list should have yarn - Key: YARN-2428 URL: https://issues.apache.org/jira/browse/YARN-2428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Allen Wittenauer Assignee: Varun Saxena Priority: Trivial Labels: newbie Fix For: 3.0.0 Attachments: YARN-2428.001.patch When task-controller was retrofitted to YARN, the default banned user list didn't add yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298908#comment-14298908 ] Zhijie Shen commented on YARN-2854: --- Will take a look The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298881#comment-14298881 ] Allen Wittenauer commented on YARN-3100: I still see no provided evidence as to why YARN needs its own ACL implementation. It was always a mistake that queue ACLs and the like weren't implemented with the common ACL implementation, given how simplistic YARN's needs ultimately are. This seems like a good opportunity to fix it without making more technical debt as proposed by this JIRA. I'm still at -1. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3108) ApplicationHistoryServer doesn't process -D arguments
[ https://issues.apache.org/jira/browse/YARN-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298752#comment-14298752 ] Hudson commented on YARN-3108: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2040 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2040/]) YARN-3108. ApplicationHistoryServer doesn't process -D arguments (Chang Li via jeagles) (jeagles: rev 30a8778c632c0f57cdd005080a470065a60756a8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java ApplicationHistoryServer doesn't process -D arguments - Key: YARN-3108 URL: https://issues.apache.org/jira/browse/YARN-3108 Project: Hadoop YARN Issue Type: Improvement Reporter: chang li Assignee: chang li Fix For: 2.7.0 Attachments: yarn3108.patch, yarn3108.patch, yarn3108.patch ApplicationHistoryServer doesn't process -D arguments when created, it's nice to have it to do that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298756#comment-14298756 ] Hudson commented on YARN-3029: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2040 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2040/]) YARN-3029. FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere. Contributed by Varun Saxena. (ozawa: rev 7acce7d3648d6f1e45ce280e2147e7dedf5693fc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java * hadoop-yarn-project/CHANGES.txt FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere - Key: YARN-3029 URL: https://issues.apache.org/jira/browse/YARN-3029 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Attachments: YARN-3029-003.patch, YARN-3029.001.patch, YARN-3029.002.patch {{FSDownload.unpack()}} lower-cases filenames in the local locale before looking at extensions for, tar, zip, .. {code} String lowerDst = dst.getName().toLowerCase(); {code} it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3119) Memory limit check need not be enforced unless aggregate usage of all containers is near limit
[ https://issues.apache.org/jira/browse/YARN-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298876#comment-14298876 ] Allen Wittenauer commented on YARN-3119: How should scheduling behave in this scenario? What happens if multiple containers are over their limit and/or what order are containers killed? Memory limit check need not be enforced unless aggregate usage of all containers is near limit -- Key: YARN-3119 URL: https://issues.apache.org/jira/browse/YARN-3119 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3119.prelim.patch Today we kill any container preemptively even if the total usage of containers for that is well within the limit for YARN. Instead if we enforce memory limit only if the total limit of all containers is close to some configurable ratio of overall memory assigned to containers, we can allow for flexibility in container memory usage without adverse effects. This is similar in principle to how cgroups uses soft_limit_in_bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298887#comment-14298887 ] Sandy Ryza commented on YARN-3101: -- [~adhoot] is this the same condition that's evaluated when reserving a resource in the first place? I.e. might we ever make a reservation and then immediately end up canceling it? Also, I believe [~l201514] is correct that reservedAppSchedulable.getResource(reservedPriority))) will not return the right quantity and node.getReservedContainer().getReservedResource() is correct. Last of all, while we're at it, can we rename fitInMaxShare to fitsInMaxShare? FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3101: -- Attachment: (was: YARN-3101-Siqi.v2.patch) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2808) yarn client tool can not list app_attempt's container info correctly
[ https://issues.apache.org/jira/browse/YARN-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2808: Attachment: YARN-2808.20150130-1.patch Find bugs issue is fixed and following issues are not related to my changes org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.cli.TestRMAdminCLI [~zjshen], can you please take a look at this issue too... yarn client tool can not list app_attempt's container info correctly Key: YARN-2808 URL: https://issues.apache.org/jira/browse/YARN-2808 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Gordon Wang Assignee: Naganarasimha G R Attachments: YARN-2808.20150126-1.patch, YARN-2808.20150130-1.patch When enabling timeline server, yarn client can not list the container info for a application attempt correctly. Here is the reproduce step. # enabling yarn timeline server # submit a MR job # after the job is finished. use yarn client to list the container info of the app attempt. Then, since the RM has cached the application's attempt info, the output show {noformat} [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list appattempt_1415168250217_0001_01 14/11/05 01:19:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/05 01:19:15 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/ 14/11/05 01:19:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/11/05 01:19:16 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 Total number of containers :0 Container-Id Start Time Finish Time StateHost LOG-URL {noformat} But if the rm is restarted, client can fetch the container info from timeline server correctly. {noformat} [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list appattempt_1415168250217_0001_01 14/11/05 01:21:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/05 01:21:06 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/ 14/11/05 01:21:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/11/05 01:21:06 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 Total number of containers :4 Container-Id Start Time Finish Time StateHost LOG-URL container_1415168250217_0001_01_01 1415168318376 1415168349896COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_01/container_1415168250217_0001_01_01/hadoop container_1415168250217_0001_01_02 1415168326399 1415168334858COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_02/container_1415168250217_0001_01_02/hadoop container_1415168250217_0001_01_03 1415168326400 1415168335277COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_03/container_1415168250217_0001_01_03/hadoop container_1415168250217_0001_01_04 1415168335825 1415168343873COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_04/container_1415168250217_0001_01_04/hadoop {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3104) RM generates new AMRM tokens every heartbeat between rolling and activation
[ https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299032#comment-14299032 ] Jian He commented on YARN-3104: --- thanks for your detailed explanation. Effectively, so far, the new token client get from the server is not used by the server at all for re-authenticatoin.. patch looks good to me. RM generates new AMRM tokens every heartbeat between rolling and activation --- Key: YARN-3104 URL: https://issues.apache.org/jira/browse/YARN-3104 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-3104.001.patch When the RM rolls a new AMRM secret, it conveys this to the AMs when it notices they are still connected with the old key. However neither the RM nor the AM explicitly close the connection or otherwise try to reconnect with the new secret. Therefore the RM keeps thinking the AM doesn't have the new token on every heartbeat and keeps sending new tokens for the period between the key roll and the key activation. Once activated the RM no longer squawks in its logs about needing to generate a new token every heartbeat (i.e.: second) for every app, but the apps can still be using the old token. The token is only checked upon connection to the RM. The apps don't reconnect when sent a new token, and the RM doesn't force them to reconnect by closing the connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2543) Resource usage should be published to the timeline server as well
[ https://issues.apache.org/jira/browse/YARN-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299123#comment-14299123 ] Hadoop QA commented on YARN-2543: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695578/YARN-2543.20150130-1.patch against trunk revision f2c9109. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6467//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6467//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6467//console This message is automatically generated. Resource usage should be published to the timeline server as well - Key: YARN-2543 URL: https://issues.apache.org/jira/browse/YARN-2543 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Zhijie Shen Assignee: Naganarasimha G R Attachments: YARN-2543.20150125-1.patch, YARN-2543.20150130-1.patch RM will include the resource usage in the app report, but generic history service doesn't, because RM doesn't publish this data to the timeline server -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2942) Aggregated Log Files should be compacted
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2942: Attachment: YARN-2942-preliminary.002.patch I've just uploaded YARN-2942-preliminary.002.patch, which follows the v2 design doc. It's basically all there except for unit tests and ZK/Curator security stuff, which I'm still working on. Here's some more technical info on the implementation: - Compacted logs and are placed in the same directory as the aggregated logs, which has some benefits, including -- No need to duplicate directory structure -- {{AggregatedLogDeletionService}} handles cleaning up old compacted logs without any changes :) - {{CompactedAggregatedLogFormat}} has a Reader and Writer that handles the details of reading and writing the compacted logs. - To simplify some reading code in the {{AggregatedLogsBlock}}, I created a {{LogFormatReader}} interface, which defines some common methods that both an {{AggregatedLogFormat.LogReader}} and a {{CompactedAggregatedLogFormat.LogReader}} both have - The {{AggregatedLogsBlock}} first tries to read from the compacted log file; if it can't find it, or can't find the container in the index, or has some other problem, it will fallback to the aggregated log and have the same behavior as before -- The file formats for the aggregated logs and compacted logs are similar enough that the {{AggregatedLogFormat.ContainerLogsReader}} can be used on either, so there's no new log file parsing code for that - Here's the process that the NM goes through (if compaction is enabled): -- After the {{AppLogAggregatorImpl}} is done uploading aggregated log files, it will then try to acquire the Curator lock for the current application -- Then it will append its log file -- Then it will delete its aggregated log file -- Then it will release the lock It would be great if I could get some feedback on the current patch so far. Aggregated Log Files should be compacted Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CompactedAggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, YARN-2942-preliminary.002.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs
[ https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Maron updated YARN-2031: - Attachment: YARN-2031.patch.001 initial attempt at supplying the correct HTTP response code (307) from the proxy servlet YARN Proxy model doesn't support REST APIs in AMs - Key: YARN-2031 URL: https://issues.apache.org/jira/browse/YARN-2031 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2031.patch.001 AMs can't support REST APIs because # the AM filter redirects all requests to the proxy with a 302 response (not 307) # the proxy doesn't forward PUT/POST/DELETE verbs Either the AM filter needs to return 307 and the proxy to forward the verbs, or Am filter should not filter a REST bit of the web site -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3119) Memory limit check need not be enforced unless aggregate usage of all containers is near limit
[ https://issues.apache.org/jira/browse/YARN-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299304#comment-14299304 ] Anubhav Dhoot commented on YARN-3119: - The scheduling should continue like before where we schedule containers. If the new container causes the ratio to be exceeded we would kill the offending containers. In case we dont exceed the limit, the offending containers could be given a chance to succeed which can improve throughput of jobs that have skews like this. If multiple containers are over limit they are all deleted for now. In future we can be more sophisticated that we kill containers in reverse order of the amount they exceed by or some other criteria, until we go back below the ratio. That would be a good second improvement over this. In general this jira attempts to make memory a little more of a flexible resource. Memory limit check need not be enforced unless aggregate usage of all containers is near limit -- Key: YARN-3119 URL: https://issues.apache.org/jira/browse/YARN-3119 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3119.prelim.patch Today we kill any container preemptively even if the total usage of containers for that is well within the limit for YARN. Instead if we enforce memory limit only if the total limit of all containers is close to some configurable ratio of overall memory assigned to containers, we can allow for flexibility in container memory usage without adverse effects. This is similar in principle to how cgroups uses soft_limit_in_bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299306#comment-14299306 ] Hadoop QA commented on YARN-3101: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695624/YARN-3101-Siqi.v2.patch against trunk revision 951b360. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6470//console This message is automatically generated. FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3101: -- Attachment: YARN-3101-Siqi.v2.patch FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3101: -- Attachment: YARN-3101-Siqi.v2.patch FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299302#comment-14299302 ] Wangda Tan commented on YARN-2868: -- bq. Our scenario is debugging queue related issues for which we need queue related metrics because scheduling decisions are made based on the queue. What would be a good place to add metrics for all those queue related metrics? It makes sense to me since it's use-case driven. However, I'm wondering maybe the first container allocation delay is not correctly calculated in this patch. Thinking about a queue with some pending applications, but no queue gets resource allocated from RM (maybe there's any issue of the cluster). In this case, the first container allocation delay will be 0. I think we should consider the time of an app waiting for RM allocating container. So even if there's no container allocated in a queue, first container allocation delay will still be consistently increasing, which can help trouble shooting cluster issues. Does this make sense? [~jianhe]. Add metric for initial container launch time to FairScheduler - Key: YARN-2868 URL: https://issues.apache.org/jira/browse/YARN-2868 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Anubhav Dhoot Labels: metrics, supportability Attachments: YARN-2868-01.patch, YARN-2868.002.patch, YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, YARN-2868.006.patch, YARN-2868.007.patch Add a metric to measure the latency between starting container allocation and first container actually allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3119) Memory limit check need not be enforced unless aggregate usage of all containers is near limit
[ https://issues.apache.org/jira/browse/YARN-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299326#comment-14299326 ] Wangda Tan commented on YARN-3119: -- IMHO, this could be problematic if a under-usage container (c1) wants to get more resource, but the resource is over-used by another container (c2). It is possible c1 tries to allocate but failed since memory is exhausted since NM needs some time to get resource back (kill c2). Memory limit check need not be enforced unless aggregate usage of all containers is near limit -- Key: YARN-3119 URL: https://issues.apache.org/jira/browse/YARN-3119 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3119.prelim.patch Today we kill any container preemptively even if the total usage of containers for that is well within the limit for YARN. Instead if we enforce memory limit only if the total limit of all containers is close to some configurable ratio of overall memory assigned to containers, we can allow for flexibility in container memory usage without adverse effects. This is similar in principle to how cgroups uses soft_limit_in_bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2543) Resource usage should be published to the timeline server as well
[ https://issues.apache.org/jira/browse/YARN-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299317#comment-14299317 ] Zhijie Shen commented on YARN-2543: --- Can you check the test failure in TestSystemMetricsPublisher? Resource usage should be published to the timeline server as well - Key: YARN-2543 URL: https://issues.apache.org/jira/browse/YARN-2543 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Zhijie Shen Assignee: Naganarasimha G R Attachments: YARN-2543.20150125-1.patch, YARN-2543.20150130-1.patch RM will include the resource usage in the app report, but generic history service doesn't, because RM doesn't publish this data to the timeline server -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299348#comment-14299348 ] Anubhav Dhoot commented on YARN-3101: - Hi [~l201514] why do we need to make the comparator instead =. What case does this address? [~sandyr] I did not see a check when placing a reservation. We check a queue usage once in FSLeafQueue#assignContainerPreCheck but we do not know the container size until actual reserve happens in FSAppAttempt reserve. FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3112) AM restart and keep containers from previous attempts, then new container launch failed
[ https://issues.apache.org/jira/browse/YARN-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299003#comment-14299003 ] Jack Chen commented on YARN-3112: - I have found the cause for this error: the new launched appattempt will transfer the old containers from previous attempts, so the Nodeset in NMTokenSecretManagerInRM.java will be filled. When the new appattempt to get the allocated containers via pullNewlyAllocatedContainersAndNMTokens(), it will get null nmToken because of the full Nodeset in createAndGetNMToken(). The Null nmToken will be returned to the ContainerLauncher, so the new container will fail in the launch. What i have done is clear the nodeset in pullNewlyAllocatedContainersAndNMTokens() before the creation of container and node tokens. public synchronized ContainersAndNMTokensAllocation 438 pullNewlyAllocatedContainersAndNMTokens() { 439 ListContainer returnContainerList = 440 new ArrayListContainer(newlyAllocatedContainers.size()); 441 ListNMToken nmTokens = new ArrayListNMToken(); + 442 // clear the nodeset for NMTokens + 443 rmContext.getNMTokenSecretManager().clearNodeSetForAttempt(getApplicationAttemptId()); 444 for (IteratorRMContainer i = newlyAllocatedContainers.iterator(); i 445 .hasNext();) { 446 RMContainer rmContainer = i.next(); 447 Container container = rmContainer.getContainer(); 448 try { 449 // create container token and NMToken altogether. 450 container.setContainerToken(rmContext.getContainerTokenSecretManager() 451 .createContainerToken(container.getId(), container.getNodeId(), 452 getUser(), container.getResource(), container.getPriority(), 453 rmContainer.getCreationTime(), this.logAggregationContext)); 454 NMToken nmToken = 455 rmContext.getNMTokenSecretManager().createAndGetNMToken(getUser(), 456 getApplicationAttemptId(), container); + 457 //check whether nmtoken is null + 458 LOG.info([hchen]NMToken for container +container.getId()+ NMToken:+nmToken); 459 if (nmToken != null) { 460 nmTokens.add(nmToken); 461 } 462 } catch (IllegalArgumentException e) { 463 // DNS might be down, skip returning this container. 464 LOG.error(Error trying to assign container token and NM token to + 465 an allocated container + container.getId(), e); 466 continue; 467 } 468 returnContainerList.add(container); 469 i.remove(); 470 rmContainer.handle(new RMContainerEvent(rmContainer.getContainerId(), 471 RMContainerEventType.ACQUIRED)); 472 } 473 return new ContainersAndNMTokensAllocation(returnContainerList, nmTokens); 474 } AM restart and keep containers from previous attempts, then new container launch failed --- Key: YARN-3112 URL: https://issues.apache.org/jira/browse/YARN-3112 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Affects Versions: 2.6.0 Environment: in real linux cluster Reporter: Jack Chen This error is very similar to YARN-1795, YARN-1839, but i have check the solution of those jira, the patches are already included in my version. I think this error is caused by the different NMTokens between old and new appattempts. New AM has inherited the old tokens from previous AM according to my configuration (keepContainers=true), so the token for new containers are replaced by the old one in the NMTokenCache. 206 2015-01-29 10:04:49,603 ERROR [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1422546145900_0001_02_02 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for ixk02:47625 207 › at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProt ocolProxy.java:256) 208 › at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtoc olProxy.java:246) 209 › at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:132) 210 › at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:401) 211 › at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) 212 › at
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299137#comment-14299137 ] Sangjin Lee commented on YARN-2928: --- Thanks [~zjshen] for putting it together! It looks good mostly. Some high level comments... (1) Are relates to and is related to meant to capture the parent-child relationship? (2) (Flow) run and application definitely have a parent-child relationship. Now it's less clear between the flow and the flow run. One scenario that is definitely worth considering is a flow of flows, and that brings some complications to this. Suppose you have an oozie flow that starts a pig script which in turn spawns multiple MR jobs. If flow is an entity and parent of the flow run, how to model this situation becomes more challenging. One idea might be oozie flow - oozie flow run - pig flow - pig flow run - MR job However, the oozie flow run is not really the parent of the pig flow. Rather, the oozie flow run is the parent of the pig flow run. Another idea is not to have the flow as a separate entity but as metadata of the flow run entities. And that's actually what the design doc indicates (see sections 3.1.1. and 3.1.2). Now one issue with not having the flow as an entity is that it might complicate the aggregation scenario. More on that later... (3) Could we stick with the same terminology as in the design doc? Those are flow and flow run. Thoughts? Better suggestions? (4) The part about the metrics would need to be further expanded with the metrics API JIRA, but I definitely see at least two types of metrics: one that requires a time series and another that doesn't. The former may be something like CPU, and the latter would be something like HDFS bytes written for example. For the latter type, the only value that matters for a given metric is the latest value. And depending on which type, the way to implement the storage could be hugely different. I think we need to come up with a well-defined set of metric types that cover most useful cases. Initially we said we were going to look at the existing hadoop metrics types, but we might need to come up with our own here. (5) The parent-child relationship (and therefore the necessity of making things entities) is tightly related with *aggregation* (rolling up the values from children to parent). The idea was that for parent-child entities aggregation would be done generically as part of creating/updating those entities (what we called primary aggregation in some discussion). If cluster or user is not an entity, then there is no parent-child relationship, and aggregation from flows to user or cluster would have to be done explicitly outside the context of the parent-child relationship. Of course that is doable; we could just do it as specific aggregation. Maybe that's what we need to do (and the queue-level aggregation which Robert mentioned could be treated in the same manner). Either way, I think we should mention how the run/flow/user/cluster/queue aggregation would be done. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3020) n similar addContainerRequest()s produce n*(n+1)/2 containers
[ https://issues.apache.org/jira/browse/YARN-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299184#comment-14299184 ] Peter D Kirchner commented on YARN-3020: That expected usage you describe, and current implementation contains a basic synchronization problem. The client application's RPC updates requests to the RM before it receives the containers newly assigned during that heartbeat. Therefore, if (as is currently the case) the client calculates the total requests, the total is too large by at least the number of matching incoming assignments. Per expected usage and current implementation, both add and remove cause this obsolete, too-high total to be sent. Cause or coincidence, I see applications (including but not limited to distributedShell) making matching requests in a short interval and never calling remove. They receive the behavior they need, or closer to it, than the expected usage would produce. Further, in this API implementation/expected usage the remove API tries to serve two purposes that are similar but not identical: to update the client-side bookkeeping and to identify the request data to be sent to the server. The problem here is that if there are only removes for allocated containers, then the server-side bookkeeping is correct until the client sends the total. The removes called for incoming assigned containers should not be forwarded to the RM until there is at least one matching add, or a bona-fide removal of a previously add-ed request. I suppose the current implementation could be defended because its error is: 1) only too high by the number of matching incoming assignments, 2) persists only for the number of heartbeats it takes to clear the out of sync condition 3) results in spurious allocations only once the application's intentional matching requests were granted. I maintain that spurious allocations are worst-case and especially damaging if obtained by preemption. I want to suggest an alternative that is simpler and accurate, and limited to the AMRMClient and RM. The fact that the scheduler is updated by replacement informs the choice of where Yarn should calculate that total for a matching request. The client is in a position to accurately calculate how much its current wants differ from what it has asked for over its life. This suggests a fix to the synchronization problem by having the client send the net of add/remove requests it has accumulated over a heartbeat cycle, and having the RM update its totals, from the difference obtained from the client, using synchronized methods. (Note, this client would not ordinarily call remove when it received a container, as the scheduler has already properly accounted for it when it made the allocation). n similar addContainerRequest()s produce n*(n+1)/2 containers - Key: YARN-3020 URL: https://issues.apache.org/jira/browse/YARN-3020 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2 Reporter: Peter D Kirchner Original Estimate: 24h Remaining Estimate: 24h BUG: If the application master calls addContainerRequest() n times, but with the same priority, I get up to 1+2+3+...+n containers = n*(n+1)/2 . The most containers are requested when the interval between calls to addContainerRequest() exceeds the heartbeat interval of calls to allocate() (in AMRMClientImpl's run() method). If the application master calls addContainerRequest() n times, but with a unique priority each time, I get n containers (as I intended). Analysis: There is a logic problem in AMRMClientImpl.java. Although AMRMClientImpl.java, allocate() does an ask.clear() , on subsequent calls to addContainerRequest(), addResourceRequest() finds the previous matching remoteRequest and increments the container count rather than starting anew, and does an addResourceRequestToAsk() which defeats the ask.clear(). From documentation and code comments, it was hard for me to discern the intended behavior of the API, but the inconsistency reported in this issue suggests one case or the other is implemented incorrectly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3120) YarnException on windows + org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local dirnm-local-dir, which was marked as good.
vaidhyanathan created YARN-3120: --- Summary: YarnException on windows + org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local dirnm-local-dir, which was marked as good. Key: YARN-3120 URL: https://issues.apache.org/jira/browse/YARN-3120 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Environment: Windows 8 , Hadoop 2.6.0 Reporter: vaidhyanathan Hi, I tried to follow the instructiosn in http://wiki.apache.org/hadoop/Hadoop2OnWindows and have setup hadoop-2.6.0.jar in my windows system. I was able to start everything properly but when i try to run the job wordcount as given in the above URL , the job fails with the below exception . 15/01/30 12:56:09 INFO localizer.ResourceLocalizationService: Localizer failed org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local di r /tmp/hadoop-haremangala/nm-local-dir, which was marked as good. at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService. java:1372) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService.access$900(ResourceLocalizationService.java:137) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java :1085) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2808) yarn client tool can not list app_attempt's container info correctly
[ https://issues.apache.org/jira/browse/YARN-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299033#comment-14299033 ] Hadoop QA commented on YARN-2808: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695573/YARN-2808.20150130-1.patch against trunk revision f2c9109. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6466//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6466//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6466//console This message is automatically generated. yarn client tool can not list app_attempt's container info correctly Key: YARN-2808 URL: https://issues.apache.org/jira/browse/YARN-2808 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Gordon Wang Assignee: Naganarasimha G R Attachments: YARN-2808.20150126-1.patch, YARN-2808.20150130-1.patch When enabling timeline server, yarn client can not list the container info for a application attempt correctly. Here is the reproduce step. # enabling yarn timeline server # submit a MR job # after the job is finished. use yarn client to list the container info of the app attempt. Then, since the RM has cached the application's attempt info, the output show {noformat} [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list appattempt_1415168250217_0001_01 14/11/05 01:19:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/05 01:19:15 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/ 14/11/05 01:19:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/11/05 01:19:16 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 Total number of containers :0 Container-Id Start Time Finish Time StateHost LOG-URL {noformat} But if the rm is restarted, client can fetch the container info from timeline server correctly. {noformat} [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list appattempt_1415168250217_0001_01 14/11/05 01:21:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/05 01:21:06 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/ 14/11/05 01:21:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/11/05 01:21:06 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 Total number of containers :4 Container-Id Start Time Finish Time StateHost LOG-URL container_1415168250217_0001_01_01 1415168318376 1415168349896COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_01/container_1415168250217_0001_01_01/hadoop container_1415168250217_0001_01_02 1415168326399 1415168334858COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_02/container_1415168250217_0001_01_02/hadoop container_1415168250217_0001_01_03
[jira] [Updated] (YARN-2543) Resource usage should be published to the timeline server as well
[ https://issues.apache.org/jira/browse/YARN-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2543: Attachment: YARN-2543.20150130-1.patch Hi [~zjshen] Below testcase failure is not related to my changes org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore 1,2 3 are taken care with the attached patch. Will start working on the other issue Resource usage should be published to the timeline server as well - Key: YARN-2543 URL: https://issues.apache.org/jira/browse/YARN-2543 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Zhijie Shen Assignee: Naganarasimha G R Attachments: YARN-2543.20150125-1.patch, YARN-2543.20150130-1.patch RM will include the resource usage in the app report, but generic history service doesn't, because RM doesn't publish this data to the timeline server -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3075: --- Attachment: YARN-3075.003.patch NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch, YARN-3075.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3107) Update TestYarnConfigurationFields to flag missing properties in yarn-default.xml with an error
[ https://issues.apache.org/jira/browse/YARN-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-3107: - Attachment: YARN-3107.001.patch Get an example checked in. Can't turn on until yarn-default.xml is clean. - Set flag to check xml files - Add SCMStore properties to ignore - Add ATS v1 properties to ignore Update TestYarnConfigurationFields to flag missing properties in yarn-default.xml with an error --- Key: YARN-3107 URL: https://issues.apache.org/jira/browse/YARN-3107 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Labels: supportability Attachments: YARN-3107.001.patch TestYarnConfigurationFields currently makes sure each property in yarn-default.xml is documented in one of the YARN configuration Java classes. The reverse check can be turned on once the each YARN property is: A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299173#comment-14299173 ] Hadoop QA commented on YARN-3075: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695584/YARN-3075.003.patch against trunk revision f2c9109. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6468//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6468//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6468//console This message is automatically generated. NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch, YARN-3075.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298999#comment-14298999 ] Jian He commented on YARN-3100: --- bq. It was always a mistake that queue ACLs and the like weren't implemented with the common ACL implementation, Would you please specify which exact piece of code regarding the service acl implementation YARN should re-use, but YARN did not ? YARN always re-use any existing library from common. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2808) yarn client tool can not list app_attempt's container info correctly
[ https://issues.apache.org/jira/browse/YARN-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299101#comment-14299101 ] Zhijie Shen commented on YARN-2808: --- I think the patch should work, though it's not guarantee all the containers will be returned for a running attempt due to some race condition that container is finished, its info is pushed to timeline server, but is still not persisted. Anyway, it will be a good improvement in terms of user experience. Some minor comments: 1. Is it possible to improve the performance? The application could be big to have hundreds of containers. It's not efficient to loop through them many times. Maybe run through them once, and put the ids in a hashset for check? {code} for (int i = 0; i containersFromHistoryServer.size(); i++) { if (containersFromHistoryServer.get(i).getContainerId() .equals(tmp.getContainerId())) { containersFromHistoryServer.remove(i); //Remove containers from AHS as container from RM will have latest //information break; } } {code} 2. In the test can we add a case that the running container is in RM, and it's also in the timeline server as part of its information is written there, the container info cached in RM is sourced instead of the partial info in the timeline server. yarn client tool can not list app_attempt's container info correctly Key: YARN-2808 URL: https://issues.apache.org/jira/browse/YARN-2808 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Gordon Wang Assignee: Naganarasimha G R Attachments: YARN-2808.20150126-1.patch, YARN-2808.20150130-1.patch When enabling timeline server, yarn client can not list the container info for a application attempt correctly. Here is the reproduce step. # enabling yarn timeline server # submit a MR job # after the job is finished. use yarn client to list the container info of the app attempt. Then, since the RM has cached the application's attempt info, the output show {noformat} [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list appattempt_1415168250217_0001_01 14/11/05 01:19:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/05 01:19:15 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/ 14/11/05 01:19:16 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/11/05 01:19:16 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 Total number of containers :0 Container-Id Start Time Finish Time StateHost LOG-URL {noformat} But if the rm is restarted, client can fetch the container info from timeline server correctly. {noformat} [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list appattempt_1415168250217_0001_01 14/11/05 01:21:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/05 01:21:06 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/ 14/11/05 01:21:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/11/05 01:21:06 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200 Total number of containers :4 Container-Id Start Time Finish Time StateHost LOG-URL container_1415168250217_0001_01_01 1415168318376 1415168349896COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_01/container_1415168250217_0001_01_01/hadoop container_1415168250217_0001_01_02 1415168326399 1415168334858COMPLETElocalhost.localdomain:47024 http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_02/container_1415168250217_0001_01_02/hadoop container_1415168250217_0001_01_03 1415168326400 1415168335277COMPLETElocalhost.localdomain:47024
[jira] [Commented] (YARN-3120) YarnException on windows + org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local dirnm-local-dir, which was marked as good.
[ https://issues.apache.org/jira/browse/YARN-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299111#comment-14299111 ] vaidhyanathan commented on YARN-3120: - I tried to change the folder permission for nmprivate manually by using chmod 700 as its expecting but the issue doesnt see to be resolved. YarnException on windows + org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local dirnm-local-dir, which was marked as good. --- Key: YARN-3120 URL: https://issues.apache.org/jira/browse/YARN-3120 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Environment: Windows 8 , Hadoop 2.6.0 Reporter: vaidhyanathan Hi, I tried to follow the instructiosn in http://wiki.apache.org/hadoop/Hadoop2OnWindows and have setup hadoop-2.6.0.jar in my windows system. I was able to start everything properly but when i try to run the job wordcount as given in the above URL , the job fails with the below exception . 15/01/30 12:56:09 INFO localizer.ResourceLocalizationService: Localizer failed org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local di r /tmp/hadoop-haremangala/nm-local-dir, which was marked as good. at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService. java:1372) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService.access$900(ResourceLocalizationService.java:137) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java :1085) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299180#comment-14299180 ] Hadoop QA commented on YARN-3075: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695585/YARN-3075.003.patch against trunk revision f2c9109. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6469//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6469//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6469//console This message is automatically generated. NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch, YARN-3075.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3075: --- Attachment: YARN-3075.003.patch NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch, YARN-3075.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3075: --- Attachment: (was: YARN-3075.003.patch) NodeLabelsManager implementation to retrieve label to node mapping -- Key: YARN-3075 URL: https://issues.apache.org/jira/browse/YARN-3075 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3075.001.patch, YARN-3075.002.patch, YARN-3075.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3099) Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label.
[ https://issues.apache.org/jira/browse/YARN-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299364#comment-14299364 ] Hudson commented on YARN-3099: -- FAILURE: Integrated in Hadoop-trunk-Commit #6971 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6971/]) YARN-3099. Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. Contributed by Wangda Tan (jianhe: rev 86358221fc85a7743052a0b4c1647353508bf308) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSQueueUtils.java Capacity Scheduler LeafQueue/ParentQueue should use ResourceUsage to track used-resources-by-label. --- Key: YARN-3099 URL: https://issues.apache.org/jira/browse/YARN-3099 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-3099.1.patch, YARN-3099.2.patch, YARN-3099.3.patch, YARN-3099.4.patch After YARN-3092, resource-by-label (include used-resource/pending-resource/reserved-resource/AM-resource, etc.) should be tracked in ResourceUsage. To make each individual patch smaller to get easier review, this patch is targeting to make used-resources-by-label in CS Queues are all tracked by ResourceUsage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299403#comment-14299403 ] Siqi Li commented on YARN-3101: --- feel free to still use =. This doesn't change the overall behavior FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2543) Resource usage should be published to the timeline server as well
[ https://issues.apache.org/jira/browse/YARN-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299611#comment-14299611 ] Hadoop QA commented on YARN-2543: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695701/YARN-2543.20150131-1.patch against trunk revision 09ad9a8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6473//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6473//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6473//console This message is automatically generated. Resource usage should be published to the timeline server as well - Key: YARN-2543 URL: https://issues.apache.org/jira/browse/YARN-2543 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Zhijie Shen Assignee: Naganarasimha G R Attachments: YARN-2543.20150125-1.patch, YARN-2543.20150130-1.patch, YARN-2543.20150131-1.patch RM will include the resource usage in the app report, but generic history service doesn't, because RM doesn't publish this data to the timeline server -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299396#comment-14299396 ] Anubhav Dhoot commented on YARN-3101: - The test case was modified to add a negative test that show reservation that should be maintained are not being anymore. So we need the test case changes from my patch. FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299423#comment-14299423 ] Hadoop QA commented on YARN-3101: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695632/YARN-3101-Siqi.v2.patch against trunk revision 951b360. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6471//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6471//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6471//console This message is automatically generated. FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3104) RM generates new AMRM tokens every heartbeat between rolling and activation
[ https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299474#comment-14299474 ] Jian He commented on YARN-3104: --- Hi Jason, just one comment on the patch: {{amrmToken.decodeIdentifier().getKeyId(}} is internally doing reflection stuff. will it be costly to invoke this on every AM heartbeat. maybe we can cache the keyId ? RM generates new AMRM tokens every heartbeat between rolling and activation --- Key: YARN-3104 URL: https://issues.apache.org/jira/browse/YARN-3104 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-3104.001.patch When the RM rolls a new AMRM secret, it conveys this to the AMs when it notices they are still connected with the old key. However neither the RM nor the AM explicitly close the connection or otherwise try to reconnect with the new secret. Therefore the RM keeps thinking the AM doesn't have the new token on every heartbeat and keeps sending new tokens for the period between the key roll and the key activation. Once activated the RM no longer squawks in its logs about needing to generate a new token every heartbeat (i.e.: second) for every app, but the apps can still be using the old token. The token is only checked upon connection to the RM. The apps don't reconnect when sent a new token, and the RM doesn't force them to reconnect by closing the connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2543) Resource usage should be published to the timeline server as well
[ https://issues.apache.org/jira/browse/YARN-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2543: Attachment: YARN-2543.20150131-1.patch missed to make the modifications for TestSystemMetricsPublisher have corrected it now Resource usage should be published to the timeline server as well - Key: YARN-2543 URL: https://issues.apache.org/jira/browse/YARN-2543 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Zhijie Shen Assignee: Naganarasimha G R Attachments: YARN-2543.20150125-1.patch, YARN-2543.20150130-1.patch, YARN-2543.20150131-1.patch RM will include the resource usage in the app report, but generic history service doesn't, because RM doesn't publish this data to the timeline server -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299530#comment-14299530 ] Robert Kanter commented on YARN-2928: - To mirror my comment from the doc that Sangjin is referring to; I had said: {quote}It would be useful to be able to aggregate to queues; what would be a good way to fit those into the data model?{quote} in the Some issues to address section. As discussed, if we only do child -- parent aggregation (primary aggregation), then we can't aggregate to queues because they don't really fit in that path. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299570#comment-14299570 ] Anubhav Dhoot commented on YARN-3122: - Similar to YARN-2984 this would track CPU usage Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3122: Description: It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. was: It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track memory usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3122: Fix Version/s: (was: 2.7.0) Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track memory usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3041) create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299517#comment-14299517 ] Sangjin Lee commented on YARN-3041: --- bq. I suggest using more generalized in/outbound relationship instead of parent-child one. One parent can have multiple children obviously, but we said in the current design that we want to limit the parent to be one. The consideration was that the parent-child relationship is used really to handle the aggregation along the linear hierarchy, and multiple parents complicate that significantly. create the ATS entity/event API --- Key: YARN-3041 URL: https://issues.apache.org/jira/browse/YARN-3041 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: YARN-3041.preliminary.001.patch Per design in YARN-2928, create the ATS entity and events API. Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299428#comment-14299428 ] Sandy Ryza commented on YARN-3101: -- In that case it sounds like the behavior is that we can go one container over the max resources. While this might be worth changing in a separate JIRA, we should maintain that behavior with the reservations. FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-3122: --- Assignee: Anubhav Dhoot (was: Karthik Kambatla) Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track memory usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3122) Metrics for container's actual CPU usage
Anubhav Dhoot created YARN-3122: --- Summary: Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Karthik Kambatla Fix For: 2.7.0 It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track memory usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3077: -- Assignee: Chun Chen RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Assignee: Chun Chen Attachments: YARN-3077.2.patch, YARN-3077.3.patch, YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3101: Attachment: YARN-3101.003.patch Reverts changes from test case testReservationWhileMultiplePriorities that was modified by YARN-2811. As there are no limits on the queue, so no reservation should be removed. Hence the older behavior on the test still applies FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3121) FairScheduler preemption metrics
[ https://issues.apache.org/jira/browse/YARN-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-3121: --- Assignee: Anubhav Dhoot FairScheduler preemption metrics Key: YARN-3121 URL: https://issues.apache.org/jira/browse/YARN-3121 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Add FSQueuemetrics for preemption related information -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3121) FairScheduler preemption metrics
Anubhav Dhoot created YARN-3121: --- Summary: FairScheduler preemption metrics Key: YARN-3121 URL: https://issues.apache.org/jira/browse/YARN-3121 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Anubhav Dhoot Add FSQueuemetrics for preemption related information -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299542#comment-14299542 ] Jian He commented on YARN-3077: --- [~chenchun], I added you to the contributor list. You should be able to assign jira to yourself now. RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Assignee: Chun Chen Fix For: 2.7.0 Attachments: YARN-3077.2.patch, YARN-3077.3.patch, YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
[ https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299547#comment-14299547 ] Hudson commented on YARN-3077: -- FAILURE: Integrated in Hadoop-trunk-Commit #6974 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6974/]) YARN-3077. Fixed RM to create zk root path recursively. Contributed by Chun Chen (jianhe: rev 054a947989d6ccbe54a803ca96dcebeba8328367) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java RM should create yarn.resourcemanager.zk-state-store.parent-path recursively Key: YARN-3077 URL: https://issues.apache.org/jira/browse/YARN-3077 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Chun Chen Assignee: Chun Chen Fix For: 2.7.0 Attachments: YARN-3077.2.patch, YARN-3077.3.patch, YARN-3077.patch If multiple clusters share a zookeeper cluster, users might use /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user specified a customer value which is not a top-level path for ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent path first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3041) create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299548#comment-14299548 ] Robert Kanter commented on YARN-3041: - I'm glad it mostly matches up with the doc. 1. I think that makes sense. A Metric doesn't need all of the stuff that it's inheriting from the {{TimelineServiceEntity}}. I'm already using the old {{TimelineEvent}}, which matches up with what you had in the doc (other than having {{eventInfo}} instead of {{metadata}} 2. It sounds like we may need more discussion on this area. As [~sjlee0] pointed out, we had originally said a single parent to have a linear hierarchy for aggregation. This is different than the Relates to and Is related to in the doc and having a DAG. I wonder if it makes sense to have a parent-child relationship only to relate the entities to each other (e.g. Application is a child of Run, etc), and some other structure (not sure what) for aggregation? That would help us capture other aggregation paths for things that don't fit in the parental hierarchy. Though that makes things more complicated :( 3. You're right: they don't really need all the stuff they're inheriting from {{TimelineServiceEntity}}. I think they really only need the relationship field(s) and an id. I'll do some refactoring for another prelim version. create the ATS entity/event API --- Key: YARN-3041 URL: https://issues.apache.org/jira/browse/YARN-3041 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: YARN-3041.preliminary.001.patch Per design in YARN-2928, create the ATS entity and events API. Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299592#comment-14299592 ] Hadoop QA commented on YARN-3101: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695697/YARN-3101.003.patch against trunk revision 09ad9a8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6472//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6472//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6472//console This message is automatically generated. FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chen updated YARN-1983: Attachment: YARN-1983.2.patch Update the patch to rewrite unit tests. Support heterogeneous container types at runtime on YARN Key: YARN-1983 URL: https://issues.apache.org/jira/browse/YARN-1983 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du Attachments: YARN-1983.2.patch, YARN-1983.patch Different container types (default, LXC, docker, VM box, etc.) have different semantics on isolation of security, namespace/env, performance, etc. Per discussions in YARN-1964, we have some good thoughts on supporting different types of containers running on YARN and specified by application at runtime which largely enhance YARN's flexibility to meet heterogenous app's requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3120) YarnException on windows + org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local dirnm-local-dir, which was marked as good.
[ https://issues.apache.org/jira/browse/YARN-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299677#comment-14299677 ] Varun Vasudev commented on YARN-3120: - Are you running everything on your install drive(C:)? Windows has special security permissions on the install drive. Try creating another partition and set the local dirs on that partition. YarnException on windows + org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local dirnm-local-dir, which was marked as good. --- Key: YARN-3120 URL: https://issues.apache.org/jira/browse/YARN-3120 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Environment: Windows 8 , Hadoop 2.6.0 Reporter: vaidhyanathan Hi, I tried to follow the instructiosn in http://wiki.apache.org/hadoop/Hadoop2OnWindows and have setup hadoop-2.6.0.jar in my windows system. I was able to start everything properly but when i try to run the job wordcount as given in the above URL , the job fails with the below exception . 15/01/30 12:56:09 INFO localizer.ResourceLocalizationService: Localizer failed org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local di r /tmp/hadoop-haremangala/nm-local-dir, which was marked as good. at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService. java:1372) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService.access$900(ResourceLocalizationService.java:137) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java :1085) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299688#comment-14299688 ] Hadoop QA commented on YARN-1983: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695731/YARN-1983.2.patch against trunk revision 26c2de3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6474//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6474//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6474//console This message is automatically generated. Support heterogeneous container types at runtime on YARN Key: YARN-1983 URL: https://issues.apache.org/jira/browse/YARN-1983 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du Attachments: YARN-1983.2.patch, YARN-1983.patch Different container types (default, LXC, docker, VM box, etc.) have different semantics on isolation of security, namespace/env, performance, etc. Per discussions in YARN-1964, we have some good thoughts on supporting different types of containers running on YARN and specified by application at runtime which largely enhance YARN's flexibility to meet heterogenous app's requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3090: --- Attachment: YARN-3090.001.patch DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3090.001.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299689#comment-14299689 ] Anubhav Dhoot commented on YARN-3101: - The failure does not repro after pulling the latest trunk and the release audit warning is in a unrelated file FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299700#comment-14299700 ] Hadoop QA commented on YARN-3090: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695733/YARN-3090.001.patch against trunk revision 26c2de3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6475//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6475//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6475//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6475//console This message is automatically generated. DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3090.001.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3048) handle how to set up and start/stop ATS reader instances
[ https://issues.apache.org/jira/browse/YARN-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299690#comment-14299690 ] Varun Saxena commented on YARN-3048: [~sjlee0], isnt YARN-3118 similar to this JIRA ? Or intention of this JIRA is something else ? handle how to set up and start/stop ATS reader instances Key: YARN-3048 URL: https://issues.apache.org/jira/browse/YARN-3048 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Per design in YARN-2928, come up with a way to set up and start/stop ATS reader instances. This should allow setting up multiple instances and managing user traffic to those instances. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3090: --- Attachment: YARN-3090.002.patch DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3090.001.patch, YARN-3090.002.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299369#comment-14299369 ] Siqi Li commented on YARN-3101: --- [~adhoot] This reason of using instead of = is basically for keeping the test case intact. FairScheduler#fitInMaxShare was added to validate reservations but it does not consider it --- Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser
[ https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298637#comment-14298637 ] Eric Payne commented on YARN-3089: -- Thank you, [~sunilg], for your review of this patch. {quote} {code} int subDirEmptyStr = (subdir == NULL || subdir[0] == 0); {code} I think strlen(subdir) also has to be checked against 0, correct? {quote} The {{strlen}} function will do exactly the same thing that {{subdir[0] == 0}} does, which is check that the first byte in the string is 0. In {{strlen}}, it takes the form of {{*s == '\0'}}, but it amounts to the same thing. By checking for empty string as is done in the existing patch, it avoids the overhead of another function call. LinuxContainerExecutor does not handle file arguments to deleteAsUser - Key: YARN-3089 URL: https://issues.apache.org/jira/browse/YARN-3089 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Eric Payne Priority: Blocker Attachments: YARN-3089.v1.txt YARN-2468 added the deletion of individual logs that are aggregated, but this fails to delete log files when the LCE is being used. The LCE native executable assumes the paths being passed are paths and the delete fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2428) LCE default banned user list should have yarn
[ https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298663#comment-14298663 ] Hudson commented on YARN-2428: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2021 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2021/]) YARN-2428. LCE default banned user list should have yarn (Varun Saxena via aw) (aw: rev 9dd0b7a2ab6538d8f72b004eb97c2750ff3d98dd) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt LCE default banned user list should have yarn - Key: YARN-2428 URL: https://issues.apache.org/jira/browse/YARN-2428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Allen Wittenauer Assignee: Varun Saxena Priority: Trivial Labels: newbie Fix For: 3.0.0 Attachments: YARN-2428.001.patch When task-controller was retrofitted to YARN, the default banned user list didn't add yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3108) ApplicationHistoryServer doesn't process -D arguments
[ https://issues.apache.org/jira/browse/YARN-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298658#comment-14298658 ] Hudson commented on YARN-3108: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2021 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2021/]) YARN-3108. ApplicationHistoryServer doesn't process -D arguments (Chang Li via jeagles) (jeagles: rev 30a8778c632c0f57cdd005080a470065a60756a8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java ApplicationHistoryServer doesn't process -D arguments - Key: YARN-3108 URL: https://issues.apache.org/jira/browse/YARN-3108 Project: Hadoop YARN Issue Type: Improvement Reporter: chang li Assignee: chang li Fix For: 2.7.0 Attachments: yarn3108.patch, yarn3108.patch, yarn3108.patch ApplicationHistoryServer doesn't process -D arguments when created, it's nice to have it to do that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298662#comment-14298662 ] Hudson commented on YARN-3029: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2021 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2021/]) YARN-3029. FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere. Contributed by Varun Saxena. (ozawa: rev 7acce7d3648d6f1e45ce280e2147e7dedf5693fc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere - Key: YARN-3029 URL: https://issues.apache.org/jira/browse/YARN-3029 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Attachments: YARN-3029-003.patch, YARN-3029.001.patch, YARN-3029.002.patch {{FSDownload.unpack()}} lower-cases filenames in the local locale before looking at extensions for, tar, zip, .. {code} String lowerDst = dst.getName().toLowerCase(); {code} it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298679#comment-14298679 ] Hudson commented on YARN-3029: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #86 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/86/]) YARN-3029. FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere. Contributed by Varun Saxena. (ozawa: rev 7acce7d3648d6f1e45ce280e2147e7dedf5693fc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere - Key: YARN-3029 URL: https://issues.apache.org/jira/browse/YARN-3029 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Attachments: YARN-3029-003.patch, YARN-3029.001.patch, YARN-3029.002.patch {{FSDownload.unpack()}} lower-cases filenames in the local locale before looking at extensions for, tar, zip, .. {code} String lowerDst = dst.getName().toLowerCase(); {code} it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3108) ApplicationHistoryServer doesn't process -D arguments
[ https://issues.apache.org/jira/browse/YARN-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298675#comment-14298675 ] Hudson commented on YARN-3108: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #86 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/86/]) YARN-3108. ApplicationHistoryServer doesn't process -D arguments (Chang Li via jeagles) (jeagles: rev 30a8778c632c0f57cdd005080a470065a60756a8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java ApplicationHistoryServer doesn't process -D arguments - Key: YARN-3108 URL: https://issues.apache.org/jira/browse/YARN-3108 Project: Hadoop YARN Issue Type: Improvement Reporter: chang li Assignee: chang li Fix For: 2.7.0 Attachments: yarn3108.patch, yarn3108.patch, yarn3108.patch ApplicationHistoryServer doesn't process -D arguments when created, it's nice to have it to do that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2428) LCE default banned user list should have yarn
[ https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298680#comment-14298680 ] Hudson commented on YARN-2428: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #86 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/86/]) YARN-2428. LCE default banned user list should have yarn (Varun Saxena via aw) (aw: rev 9dd0b7a2ab6538d8f72b004eb97c2750ff3d98dd) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt LCE default banned user list should have yarn - Key: YARN-2428 URL: https://issues.apache.org/jira/browse/YARN-2428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Allen Wittenauer Assignee: Varun Saxena Priority: Trivial Labels: newbie Fix For: 3.0.0 Attachments: YARN-2428.001.patch When task-controller was retrofitted to YARN, the default banned user list didn't add yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298647#comment-14298647 ] Hadoop QA commented on YARN-2854: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695043/timeline_structure.jpg against trunk revision f2c9109. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6465//console This message is automatically generated. The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3104) RM generates new AMRM tokens every heartbeat between rolling and activation
[ https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3104: - Summary: RM generates new AMRM tokens every heartbeat between rolling and activation (was: RM continues to send new AMRM tokens every heartbeat between rolling and activation) Yes, the connection is not re-established so the updated token in the client's UGI is never re-sent to the RPC server. Therefore every time the RM asks the RPC server for the client's UGI we will continue to get the old one. Since the RM thinks the client is still using the token that was used when the connection was established, it continues to regenerate tokens (and emit corresponding logs) every heartbeat for the interval between when the new key was rolled and when it is activated (i.e.: as long as nextMasterKey != null). To tell whether the client really is using the new token we either need the RPC connection to be re-established or a way to tell the RPC layer to re-authenticate the connection. I don't believe there's a good way to do either of those given the RPC API, so this patch works around the issue a bit by comparing the token we have recorded for the app attempt with the next key. It solves the problem of regenerating tokens unnecessarily for the same app attempt. However we will continue to send the token each heartbeat since we cannot tell whether the client really has the new token. I tweaked the summary accordingly. RM generates new AMRM tokens every heartbeat between rolling and activation --- Key: YARN-3104 URL: https://issues.apache.org/jira/browse/YARN-3104 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-3104.001.patch When the RM rolls a new AMRM secret, it conveys this to the AMs when it notices they are still connected with the old key. However neither the RM nor the AM explicitly close the connection or otherwise try to reconnect with the new secret. Therefore the RM keeps thinking the AM doesn't have the new token on every heartbeat and keeps sending new tokens for the period between the key roll and the key activation. Once activated the RM no longer squawks in its logs about needing to generate a new token every heartbeat (i.e.: second) for every app, but the apps can still be using the old token. The token is only checked upon connection to the RM. The apps don't reconnect when sent a new token, and the RM doesn't force them to reconnect by closing the connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2428) LCE default banned user list should have yarn
[ https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298722#comment-14298722 ] Hudson commented on YARN-2428: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #90 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/90/]) YARN-2428. LCE default banned user list should have yarn (Varun Saxena via aw) (aw: rev 9dd0b7a2ab6538d8f72b004eb97c2750ff3d98dd) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt LCE default banned user list should have yarn - Key: YARN-2428 URL: https://issues.apache.org/jira/browse/YARN-2428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Allen Wittenauer Assignee: Varun Saxena Priority: Trivial Labels: newbie Fix For: 3.0.0 Attachments: YARN-2428.001.patch When task-controller was retrofitted to YARN, the default banned user list didn't add yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298721#comment-14298721 ] Hudson commented on YARN-3029: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #90 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/90/]) YARN-3029. FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere. Contributed by Varun Saxena. (ozawa: rev 7acce7d3648d6f1e45ce280e2147e7dedf5693fc) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere - Key: YARN-3029 URL: https://issues.apache.org/jira/browse/YARN-3029 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Attachments: YARN-3029-003.patch, YARN-3029.001.patch, YARN-3029.002.patch {{FSDownload.unpack()}} lower-cases filenames in the local locale before looking at extensions for, tar, zip, .. {code} String lowerDst = dst.getName().toLowerCase(); {code} it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3108) ApplicationHistoryServer doesn't process -D arguments
[ https://issues.apache.org/jira/browse/YARN-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298717#comment-14298717 ] Hudson commented on YARN-3108: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #90 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/90/]) YARN-3108. ApplicationHistoryServer doesn't process -D arguments (Chang Li via jeagles) (jeagles: rev 30a8778c632c0f57cdd005080a470065a60756a8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/CHANGES.txt ApplicationHistoryServer doesn't process -D arguments - Key: YARN-3108 URL: https://issues.apache.org/jira/browse/YARN-3108 Project: Hadoop YARN Issue Type: Improvement Reporter: chang li Assignee: chang li Fix For: 2.7.0 Attachments: yarn3108.patch, yarn3108.patch, yarn3108.patch ApplicationHistoryServer doesn't process -D arguments when created, it's nice to have it to do that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2428) LCE default banned user list should have yarn
[ https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298464#comment-14298464 ] Hudson commented on YARN-2428: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #89 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/89/]) YARN-2428. LCE default banned user list should have yarn (Varun Saxena via aw) (aw: rev 9dd0b7a2ab6538d8f72b004eb97c2750ff3d98dd) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c LCE default banned user list should have yarn - Key: YARN-2428 URL: https://issues.apache.org/jira/browse/YARN-2428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Allen Wittenauer Assignee: Varun Saxena Priority: Trivial Labels: newbie Fix For: 3.0.0 Attachments: YARN-2428.001.patch When task-controller was retrofitted to YARN, the default banned user list didn't add yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298463#comment-14298463 ] Hudson commented on YARN-3029: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #89 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/89/]) YARN-3029. FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere. Contributed by Varun Saxena. (ozawa: rev 7acce7d3648d6f1e45ce280e2147e7dedf5693fc) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere - Key: YARN-3029 URL: https://issues.apache.org/jira/browse/YARN-3029 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Attachments: YARN-3029-003.patch, YARN-3029.001.patch, YARN-3029.002.patch {{FSDownload.unpack()}} lower-cases filenames in the local locale before looking at extensions for, tar, zip, .. {code} String lowerDst = dst.getName().toLowerCase(); {code} it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3108) ApplicationHistoryServer doesn't process -D arguments
[ https://issues.apache.org/jira/browse/YARN-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298459#comment-14298459 ] Hudson commented on YARN-3108: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #89 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/89/]) YARN-3108. ApplicationHistoryServer doesn't process -D arguments (Chang Li via jeagles) (jeagles: rev 30a8778c632c0f57cdd005080a470065a60756a8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java ApplicationHistoryServer doesn't process -D arguments - Key: YARN-3108 URL: https://issues.apache.org/jira/browse/YARN-3108 Project: Hadoop YARN Issue Type: Improvement Reporter: chang li Assignee: chang li Fix For: 2.7.0 Attachments: yarn3108.patch, yarn3108.patch, yarn3108.patch ApplicationHistoryServer doesn't process -D arguments when created, it's nice to have it to do that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3119) Memory limit check need not be enforced unless aggregate usage of all containers is near limit
[ https://issues.apache.org/jira/browse/YARN-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3119: Component/s: nodemanager Memory limit check need not be enforced unless aggregate usage of all containers is near limit -- Key: YARN-3119 URL: https://issues.apache.org/jira/browse/YARN-3119 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3119.prelim.patch Today we kill any container preemptively even if the total usage of containers for that is well within the limit for YARN. Instead if we enforce memory limit only if the total limit of all containers is close to some configurable ratio of overall memory assigned to containers, we can allow for flexibility in container memory usage without adverse effects. This is similar in principle to how cgroups uses soft_limit_in_bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3119) Memory limit check need not be enforced unless aggregate usage of all containers is near limit
[ https://issues.apache.org/jira/browse/YARN-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-3119: --- Assignee: Anubhav Dhoot Memory limit check need not be enforced unless aggregate usage of all containers is near limit -- Key: YARN-3119 URL: https://issues.apache.org/jira/browse/YARN-3119 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3119.prelim.patch Today we kill any container preemptively even if the total usage of containers for that is well within the limit for YARN. Instead if we enforce memory limit only if the total limit of all containers is close to some configurable ratio of overall memory assigned to containers, we can allow for flexibility in container memory usage without adverse effects. This is similar in principle to how cgroups uses soft_limit_in_bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser
[ https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298371#comment-14298371 ] Sunil G commented on YARN-3089: --- Hi [~eepayne] Thank you for bringing this up.. I have a comment on same. {code} int subDirEmptyStr = (subdir == NULL || subdir[0] == 0); {code} I think strlen(subdir) also has to be checked against 0, correct? LinuxContainerExecutor does not handle file arguments to deleteAsUser - Key: YARN-3089 URL: https://issues.apache.org/jira/browse/YARN-3089 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Eric Payne Priority: Blocker Attachments: YARN-3089.v1.txt YARN-2468 added the deletion of individual logs that are aggregated, but this fails to delete log files when the LCE is being used. The LCE native executable assumes the paths being passed are paths and the delete fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3108) ApplicationHistoryServer doesn't process -D arguments
[ https://issues.apache.org/jira/browse/YARN-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298477#comment-14298477 ] Hudson commented on YARN-3108: -- FAILURE: Integrated in Hadoop-Yarn-trunk #823 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/823/]) YARN-3108. ApplicationHistoryServer doesn't process -D arguments (Chang Li via jeagles) (jeagles: rev 30a8778c632c0f57cdd005080a470065a60756a8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/CHANGES.txt ApplicationHistoryServer doesn't process -D arguments - Key: YARN-3108 URL: https://issues.apache.org/jira/browse/YARN-3108 Project: Hadoop YARN Issue Type: Improvement Reporter: chang li Assignee: chang li Fix For: 2.7.0 Attachments: yarn3108.patch, yarn3108.patch, yarn3108.patch ApplicationHistoryServer doesn't process -D arguments when created, it's nice to have it to do that -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3029) FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere
[ https://issues.apache.org/jira/browse/YARN-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298481#comment-14298481 ] Hudson commented on YARN-3029: -- FAILURE: Integrated in Hadoop-Yarn-trunk #823 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/823/]) YARN-3029. FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere. Contributed by Varun Saxena. (ozawa: rev 7acce7d3648d6f1e45ce280e2147e7dedf5693fc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * hadoop-yarn-project/CHANGES.txt FSDownload.unpack() uses local locale for FS case conversion, may not work everywhere - Key: YARN-3029 URL: https://issues.apache.org/jira/browse/YARN-3029 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Varun Saxena Attachments: YARN-3029-003.patch, YARN-3029.001.patch, YARN-3029.002.patch {{FSDownload.unpack()}} lower-cases filenames in the local locale before looking at extensions for, tar, zip, .. {code} String lowerDst = dst.getName().toLowerCase(); {code} it MUST use LOCALE_EN for the locale, else a file .ZIP won't be recognised as a zipfile in a turkish locale cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298513#comment-14298513 ] Harsh J commented on YARN-3021: --- Overall the patch looks fine to me, but please do hold up for [~vinodkv] or another YARN active committer to take a look. Could you conceive a test case for this as well, to catch regressions in behaviour in future? For example it could be done by adding an invalid token with the app, but with this option turned on. With the option turned off, such a thing will always fail and app gets rejected, but with the fix in proper behaviour it will pass through the submit procedure at least. Checkout the test-case modified in the earlier patch for a reusable reference. Also, could you document the added MR config in mapred-default.xml, describing its use and marking it also as advanced, as it disables some features of a regular resilient application such as token reuse and renewals. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2428) LCE default banned user list should have yarn
[ https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298482#comment-14298482 ] Hudson commented on YARN-2428: -- FAILURE: Integrated in Hadoop-Yarn-trunk #823 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/823/]) YARN-2428. LCE default banned user list should have yarn (Varun Saxena via aw) (aw: rev 9dd0b7a2ab6538d8f72b004eb97c2750ff3d98dd) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/CHANGES.txt LCE default banned user list should have yarn - Key: YARN-2428 URL: https://issues.apache.org/jira/browse/YARN-2428 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Allen Wittenauer Assignee: Varun Saxena Priority: Trivial Labels: newbie Fix For: 3.0.0 Attachments: YARN-2428.001.patch When task-controller was retrofitted to YARN, the default banned user list didn't add yarn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)