[jira] [Updated] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3517: Attachment: YARN-3517.001.patch Uploaded patch with fix. RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-3517.001.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures
[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-3514: Component/s: (was: yarn) nodemanager Target Version/s: 2.8.0 Assignee: Chris Nauroth Active directory usernames like domain\login cause YARN failures Key: YARN-3514 URL: https://issues.apache.org/jira/browse/YARN-3514 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: CentOS6 Reporter: john lilley Assignee: Chris Nauroth Priority: Minor Attachments: YARN-3514.001.patch We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is Kerberos-enabled and uses an external AD domain controller for the KDC. We are able to authenticate, browse HDFS, etc. However, YARN fails during localization because it seems to get confused by the presence of a \ character in the local user name. Our AD authentication on the nodes goes through sssd and set configured to map AD users onto the form domain\username. For example, our test user has a Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user domain\hadoopuser. We have no problem validating that user with PAM, logging in as that user, su-ing to that user, etc. However, when we attempt to run a YARN application master, the localization step fails when setting up the local cache directory for the AM. The error that comes out of the RM logs: 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, diagnostics='Application application_1429295486450_0001 failed 1 times due to AM Container for appattempt_1429295486450_0001_01 exited with exitCode: -1000 due to: Application application_1429295486450_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : user is DOMAIN\hadoopuser main : requested yarn user is domain\hadoopuser org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) .Failing this attempt.. Failing the application.' However, when we look on the node launching the AM, we see this: [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache [root@rpb-cdh-kerb-2 usercache]# ls -l drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser There appears to be different treatment of the \ character in different places. Something creates the directory as domain\hadoopuser but something else later attempts to use it as domain%5Chadoopuser. I’m not sure where or why the URL escapement converts the \ to %5C or why this is not consistent. I should also mention, for the sake of completeness, our auth_to_local rule is set up to map u...@domain.com to domain\user: RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3517: --- Component/s: security RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Labels: security Attachments: YARN-3517.001.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3517: --- Labels: security (was: ) RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Labels: security Attachments: YARN-3517.001.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once
[ https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504437#comment-14504437 ] Hadoop QA commented on YARN-3489: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726790/YARN-3489.02.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7417//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7417//console This message is automatically generated. RMServerUtils.validateResourceRequests should only obtain queue info once - Key: YARN-3489 URL: https://issues.apache.org/jira/browse/YARN-3489 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3489.01.patch, YARN-3489.02.patch Since the label support was added we now get the queue info for each request being validated in SchedulerUtils.validateResourceRequest. If validateResourceRequests needs to validate a lot of requests at a time (e.g.: large cluster with lots of varied locality in the requests) then it will get the queue info for each request. Since we build the queue info this generates a lot of unnecessary garbage, as the queue isn't changing between requests. We should grab the queue info once and pass it down rather than building it again for each request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3517: Affects Version/s: 2.7.0 RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
Varun Vasudev created YARN-3517: --- Summary: RM web ui for dumping scheduler logs should be for admins only Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures
[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-3514: Attachment: YARN-3514.001.patch I'm attaching a patch with the fix I described in my last comment. I added a test that passes a file name containing a '\' character through localization. With the existing code using {{URI#getRawPath}}, the test fails as shown below. (Note the incorrect URI-encoded path, similar to the reported symptom in the description.) After switching to {{URI#getPath}}, the test passes as expected. {code} Failed tests: TestContainerLocalizer.testLocalizerDiskCheckDoesNotUriEncodePath:265 Argument(s) are different! Wanted: containerLocalizer.checkDir(/my\File); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testLocalizerDiskCheckDoesNotUriEncodePath(TestContainerLocalizer.java:265) Actual invocation has different arguments: containerLocalizer.checkDir(/my%5CFile); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testLocalizerDiskCheckDoesNotUriEncodePath(TestContainerLocalizer.java:264) {code} Active directory usernames like domain\login cause YARN failures Key: YARN-3514 URL: https://issues.apache.org/jira/browse/YARN-3514 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.2.0 Environment: CentOS6 Reporter: john lilley Priority: Minor Attachments: YARN-3514.001.patch We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is Kerberos-enabled and uses an external AD domain controller for the KDC. We are able to authenticate, browse HDFS, etc. However, YARN fails during localization because it seems to get confused by the presence of a \ character in the local user name. Our AD authentication on the nodes goes through sssd and set configured to map AD users onto the form domain\username. For example, our test user has a Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user domain\hadoopuser. We have no problem validating that user with PAM, logging in as that user, su-ing to that user, etc. However, when we attempt to run a YARN application master, the localization step fails when setting up the local cache directory for the AM. The error that comes out of the RM logs: 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, diagnostics='Application application_1429295486450_0001 failed 1 times due to AM Container for appattempt_1429295486450_0001_01 exited with exitCode: -1000 due to: Application application_1429295486450_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : user is DOMAIN\hadoopuser main : requested yarn user is domain\hadoopuser org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) .Failing this attempt.. Failing the application.' However, when we look on the node launching the AM, we see this: [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache [root@rpb-cdh-kerb-2 usercache]# ls -l drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser There appears to be different treatment of the \ character in different places. Something creates the directory as domain\hadoopuser but something else later attempts to use it as domain%5Chadoopuser. I’m not sure where or why the URL escapement converts the \ to %5C or why this is not consistent. I should also mention, for the sake of completeness, our auth_to_local rule is set up to map u...@domain.com to domain\user: RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3516) killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status.
[ https://issues.apache.org/jira/browse/YARN-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3516: Attachment: (was: YARN-3516.000.patch) killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status. --- Key: YARN-3516 URL: https://issues.apache.org/jira/browse/YARN-3516 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status. This is a typo from YARN-3024. With YARN-3024, ContainerLocalizer will be killed only if {{action}} is set to {{LocalizerAction.DIE}}, calling {{response.setLocalizerAction}} will be overwritten. This is also a regression from old code. Also it make sense to kill the ContainerLocalizer when FETCH_FAILURE happened, because the container will send CLEANUP_CONTAINER_RESOURCES event after localization failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505421#comment-14505421 ] Hadoop QA commented on YARN-3319: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726913/YARN-3319.73.patch against trunk revision 8ddbb8d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7426//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7426//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7426//console This message is automatically generated. Implement a FairOrderingPolicy -- Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, YARN-3319.72.patch, YARN-3319.73.patch Implement a FairOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. The Policy will offer allocations to applications in a queue in order of least resources used, and preempt applications in reverse order (from most resources used). This will include conditional support for sizeBasedWeight style adjustment Optionally, based on a conditional configuration to enable sizeBasedWeight (default false), an adjustment to boost larger applications (to offset the natural preference for smaller applications) will adjust the resource usage value based on demand, dividing it by the below value: Math.log1p(app memory demand) / Math.log(2); In cases where the above is indeterminate (two applications are equal after this comparison), behavior falls back to comparison based on the application id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3520) get rid of excessive stacktrace caused by expired cookie in timeline log
[ https://issues.apache.org/jira/browse/YARN-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505522#comment-14505522 ] Jonathan Eagles commented on YARN-3520: --- +1. While this jira is regarding a custom KerberosAuthenticationHandler, I agree with the excessive logging. Failing to login shouldn't cause an exception. Exceptions in the log should be confined to unexpected conditions that are unhandled. get rid of excessive stacktrace caused by expired cookie in timeline log Key: YARN-3520 URL: https://issues.apache.org/jira/browse/YARN-3520 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3520.patch {code} WARN sso.CookieValidatorHelpers: Cookie has expired by 25364187 msec WARN server.AuthenticationFilter: Authentication exception: Invalid Cookie 166 org.apache.hadoop.security.authentication.client.AuthenticationException: Invalid Bouncer Cookie 167 at KerberosAuthenticationHandler.bouncerAuthenticate(KerberosAuthenticationHandler.java:94) 168 at AuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:82) 169 at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:507) 170 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 171 at org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(CrossOriginFilter.java:95) 172 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 173 at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) 174 at GzipFilter.doFilter(GzipFilter.java:188) 175 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 176 at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224) 177 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 178 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) 179 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 180 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) 181 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 182 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) 183 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) 184 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) 185 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) 186 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) 187 at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) 188 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) 189 at org.mortbay.jetty.Server.handle(Server.java:326) 190 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) 191 at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) 192 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) 193 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) 194 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) 195 at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) 196 at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) WARN sso.CookieValidatorHelpers: Cookie has expired by 25373197 msec {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3520) get rid of excessive stacktrace caused by expired cookie in timeline log
[ https://issues.apache.org/jira/browse/YARN-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505549#comment-14505549 ] Chang Li commented on YARN-3520: [~jlowe] could you please help do a final review of this patch and help commit it? Thanks get rid of excessive stacktrace caused by expired cookie in timeline log Key: YARN-3520 URL: https://issues.apache.org/jira/browse/YARN-3520 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3520.patch {code} WARN sso.CookieValidatorHelpers: Cookie has expired by 25364187 msec WARN server.AuthenticationFilter: Authentication exception: Invalid Cookie 166 org.apache.hadoop.security.authentication.client.AuthenticationException: Invalid Bouncer Cookie 167 at KerberosAuthenticationHandler.bouncerAuthenticate(KerberosAuthenticationHandler.java:94) 168 at AuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:82) 169 at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:507) 170 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 171 at org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(CrossOriginFilter.java:95) 172 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 173 at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) 174 at GzipFilter.doFilter(GzipFilter.java:188) 175 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 176 at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224) 177 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 178 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) 179 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 180 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) 181 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 182 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) 183 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) 184 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) 185 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) 186 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) 187 at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) 188 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) 189 at org.mortbay.jetty.Server.handle(Server.java:326) 190 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) 191 at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) 192 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) 193 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) 194 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) 195 at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) 196 at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) WARN sso.CookieValidatorHelpers: Cookie has expired by 25373197 msec {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3046) [Event producers] Implement MapReduce AM writing MR events to v2 ATS
[ https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505580#comment-14505580 ] Junping Du commented on YARN-3046: -- bq. If that turns out to be the case, we could conceivably create subclasses in a standard place for MR, and have all use cases use those concrete subclasses. But I'm fine deferring that aspect a little bit. It's not a critical point. Another option is to make HierarchicalTimelineEntity non abstract? I also agree we should open a new JIRA to discuss this later. [Event producers] Implement MapReduce AM writing MR events to v2 ATS Key: YARN-3046 URL: https://issues.apache.org/jira/browse/YARN-3046 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch, YARN-3046-v1-rebase.patch, YARN-3046-v1.patch, YARN-3046-v2.patch, YARN-3046-v3.patch, YARN-3046-v4.patch, YARN-3046-v5.patch, YARN-3046-v6.patch Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505593#comment-14505593 ] Junping Du commented on YARN-3437: -- I agree too if we don't have clear plan for YARN-2556 so far. There should be no reason to block other going efforts. An suggestion (optional only) is: can we adjust name (or package path) slightly for duplicated file (TimelineServerPerformance.java) with YARN-2556? We can have an additional patch to remove duplicated file when YARN-2556 get in trunk. I assume this could be easier for YARN-2928 rebase back to trunk/branch-2 as less conflict. Thoughts? convert load test driver to timeline service v.2 Key: YARN-3437 URL: https://issues.apache.org/jira/browse/YARN-3437 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3437.001.patch, YARN-3437.002.patch This subtask covers the work for converting the proposed patch for the load test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505301#comment-14505301 ] Sunil G commented on YARN-3517: --- Thanks [~vvasudev] Patch looks good. RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Labels: security Attachments: YARN-3517.001.patch, YARN-3517.002.patch, YARN-3517.003.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3516) killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status.
[ https://issues.apache.org/jira/browse/YARN-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505326#comment-14505326 ] Xuan Gong commented on YARN-3516: - [~zxu] Thanks for working on this jira. I will take a look shortly. killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status. --- Key: YARN-3516 URL: https://issues.apache.org/jira/browse/YARN-3516 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3516.000.patch killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status. This is a typo from YARN-3024. With YARN-3024, ContainerLocalizer will be killed only if {{action}} is set to {{LocalizerAction.DIE}}, calling {{response.setLocalizerAction}} will be overwritten. This is also a regression from old code. Also it make sense to kill the ContainerLocalizer when FETCH_FAILURE happened, because the container will send CLEANUP_CONTAINER_RESOURCES event after localization failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3482) Report NM available resources in heartbeat
[ https://issues.apache.org/jira/browse/YARN-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505420#comment-14505420 ] Inigo Goiri commented on YARN-3482: --- I agree, 2 is more distributed and fits better the model that we want to push. I'll implement it today. Report NM available resources in heartbeat -- Key: YARN-3482 URL: https://issues.apache.org/jira/browse/YARN-3482 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Original Estimate: 504h Remaining Estimate: 504h NMs are usually collocated with other processes like HDFS, Impala or HBase. To manage this scenario correctly, YARN should be aware of the actual available resources. The proposal is to have an interface to dynamically change the available resources and report this to the RM in every heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3482) Report NM available resources in heartbeat
[ https://issues.apache.org/jira/browse/YARN-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505440#comment-14505440 ] Lei Guo commented on YARN-3482: --- What's the relationship between this and 3332? They should be considered together. Report NM available resources in heartbeat -- Key: YARN-3482 URL: https://issues.apache.org/jira/browse/YARN-3482 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Original Estimate: 504h Remaining Estimate: 504h NMs are usually collocated with other processes like HDFS, Impala or HBase. To manage this scenario correctly, YARN should be aware of the actual available resources. The proposal is to have an interface to dynamically change the available resources and report this to the RM in every heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.
[ https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505334#comment-14505334 ] Sangjin Lee commented on YARN-3431: --- It looks good to me. One small suggestion (it's not critical but would be nicer): It would be a little more consistent and perform slightly better if the type check in getChildren() is consolidated into validateChildren(). In validateChildren() we iterate over the set anyway, and we could do the type check as part of validating it. What do you think? Sub resources of timeline entity needs to be passed to a separate endpoint. --- Key: YARN-3431 URL: https://issues.apache.org/jira/browse/YARN-3431 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch, YARN-3431.4.patch, YARN-3431.5.patch We have TimelineEntity and some other entities as subclass that inherit from it. However, we only have a single endpoint, which consume TimelineEntity rather than sub-classes and this endpoint will check the incoming request body contains exactly TimelineEntity object. However, the json data which is serialized from sub-class object seems not to be treated as an TimelineEntity object, and won't be deserialized into the corresponding sub-class object which cause deserialization failure as some discussions in YARN-3334 : https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3482) Report NM available resources in heartbeat
[ https://issues.apache.org/jira/browse/YARN-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505396#comment-14505396 ] Karthik Kambatla commented on YARN-3482: I like 2 better. Report NM available resources in heartbeat -- Key: YARN-3482 URL: https://issues.apache.org/jira/browse/YARN-3482 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Original Estimate: 504h Remaining Estimate: 504h NMs are usually collocated with other processes like HDFS, Impala or HBase. To manage this scenario correctly, YARN should be aware of the actual available resources. The proposal is to have an interface to dynamically change the available resources and report this to the RM in every heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-3521: - Assignee: Sunil G Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart
[ https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505561#comment-14505561 ] Hadoop QA commented on YARN-3468: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726942/YARN-3468.v2.patch against trunk revision 424a00d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7430//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7430//console This message is automatically generated. NM should not blindly rename usercache/filecache/nmPrivate on restart - Key: YARN-3468 URL: https://issues.apache.org/jira/browse/YARN-3468 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3468.v1.patch, YARN-3468.v2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId
[ https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505333#comment-14505333 ] Vinod Kumar Vavilapalli commented on YARN-3445: --- Better than before, will comment once see an updated patch. Cache runningApps in RMNode for getting running apps on given NodeId Key: YARN-3445 URL: https://issues.apache.org/jira/browse/YARN-3445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3445.patch Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add cache for runningApps in RMNode, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3516) killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status.
[ https://issues.apache.org/jira/browse/YARN-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505409#comment-14505409 ] Hadoop QA commented on YARN-3516: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726922/YARN-3516.000.patch against trunk revision 8ddbb8d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7428//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7428//console This message is automatically generated. killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status. --- Key: YARN-3516 URL: https://issues.apache.org/jira/browse/YARN-3516 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3516.000.patch killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status. This is a typo from YARN-3024. With YARN-3024, ContainerLocalizer will be killed only if {{action}} is set to {{LocalizerAction.DIE}}, calling {{response.setLocalizerAction}} will be overwritten. This is also a regression from old code. Also it make sense to kill the ContainerLocalizer when FETCH_FAILURE happened, because the container will send CLEANUP_CONTAINER_RESOURCES event after localization failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures
[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-3514: Attachment: YARN-3514.002.patch In the first patch, the new test passed for me locally but failed on Jenkins. I think this is because I was using a hard-coded destination path for the localized resource, and this might have caused a permissions violation on the Jenkins host. Here is patch v002. I changed the test so that the localized resource is relative to the user's filecache, which is in the proper test working directory. I also added a second test to make sure that we don't accidentally URI-decode anything. bq. I am very impressed with the short time it took to patch. Thanks! Before we declare victory though, can you check that your local file system allows the '\' character in file and directory names? The patch here definitely fixes a bug, but testing the '\' character on your local file system will tell us whether or not the whole problem is resolved for your deployment. Even better would be if you have the capability to test with my patch applied. Active directory usernames like domain\login cause YARN failures Key: YARN-3514 URL: https://issues.apache.org/jira/browse/YARN-3514 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: CentOS6 Reporter: john lilley Assignee: Chris Nauroth Priority: Minor Attachments: YARN-3514.001.patch, YARN-3514.002.patch We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is Kerberos-enabled and uses an external AD domain controller for the KDC. We are able to authenticate, browse HDFS, etc. However, YARN fails during localization because it seems to get confused by the presence of a \ character in the local user name. Our AD authentication on the nodes goes through sssd and set configured to map AD users onto the form domain\username. For example, our test user has a Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user domain\hadoopuser. We have no problem validating that user with PAM, logging in as that user, su-ing to that user, etc. However, when we attempt to run a YARN application master, the localization step fails when setting up the local cache directory for the AM. The error that comes out of the RM logs: 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, diagnostics='Application application_1429295486450_0001 failed 1 times due to AM Container for appattempt_1429295486450_0001_01 exited with exitCode: -1000 due to: Application application_1429295486450_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : user is DOMAIN\hadoopuser main : requested yarn user is domain\hadoopuser org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) .Failing this attempt.. Failing the application.' However, when we look on the node launching the AM, we see this: [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache [root@rpb-cdh-kerb-2 usercache]# ls -l drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser There appears to be different treatment of the \ character in different places. Something creates the directory as domain\hadoopuser but something else later attempts to use it as domain%5Chadoopuser. I’m not sure where or why the URL escapement converts the \ to %5C or why this is not consistent. I should also mention, for the sake of completeness, our auth_to_local rule is set up to map u...@domain.com to domain\user: RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3046) [Event producers] Implement MapReduce AM writing MR events/counters to v2 ATS
[ https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3046: - Summary: [Event producers] Implement MapReduce AM writing MR events/counters to v2 ATS (was: [Event producers] Implement MapReduce AM writing MR events to v2 ATS) [Event producers] Implement MapReduce AM writing MR events/counters to v2 ATS - Key: YARN-3046 URL: https://issues.apache.org/jira/browse/YARN-3046 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch, YARN-3046-v1-rebase.patch, YARN-3046-v1.patch, YARN-3046-v2.patch, YARN-3046-v3.patch, YARN-3046-v4.patch, YARN-3046-v5.patch, YARN-3046-v6.patch Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3520) get rid of excessive stacktrace caused by expired cookie in timeline log
[ https://issues.apache.org/jira/browse/YARN-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505425#comment-14505425 ] Mit Desai commented on YARN-3520: - lgtm +1 (non-binding) This change is related to logging so there is no need for tests. get rid of excessive stacktrace caused by expired cookie in timeline log Key: YARN-3520 URL: https://issues.apache.org/jira/browse/YARN-3520 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3520.patch {code} WARN sso.CookieValidatorHelpers: Cookie has expired by 25364187 msec WARN server.AuthenticationFilter: Authentication exception: Invalid Cookie 166 org.apache.hadoop.security.authentication.client.AuthenticationException: Invalid Bouncer Cookie 167 at KerberosAuthenticationHandler.bouncerAuthenticate(KerberosAuthenticationHandler.java:94) 168 at AuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:82) 169 at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:507) 170 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 171 at org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(CrossOriginFilter.java:95) 172 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 173 at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) 174 at GzipFilter.doFilter(GzipFilter.java:188) 175 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 176 at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224) 177 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 178 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) 179 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 180 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) 181 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 182 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) 183 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) 184 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) 185 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) 186 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) 187 at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) 188 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) 189 at org.mortbay.jetty.Server.handle(Server.java:326) 190 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) 191 at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) 192 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) 193 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) 194 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) 195 at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) 196 at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) WARN sso.CookieValidatorHelpers: Cookie has expired by 25373197 msec {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505602#comment-14505602 ] Hadoop QA commented on YARN-3434: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726966/YARN-3434.patch against trunk revision 997408e. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7432//console This message is automatically generated. Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3520) get rid of excessive stacktrace caused by expired cookie in timeline log
[ https://issues.apache.org/jira/browse/YARN-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505311#comment-14505311 ] Hadoop QA commented on YARN-3520: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726911/YARN-3520.patch against trunk revision 8ddbb8d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-auth. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7425//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7425//console This message is automatically generated. get rid of excessive stacktrace caused by expired cookie in timeline log Key: YARN-3520 URL: https://issues.apache.org/jira/browse/YARN-3520 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3520.patch {code} WARN sso.CookieValidatorHelpers: Cookie has expired by 25364187 msec WARN server.AuthenticationFilter: Authentication exception: Invalid Cookie 166 org.apache.hadoop.security.authentication.client.AuthenticationException: Invalid Bouncer Cookie 167 at KerberosAuthenticationHandler.bouncerAuthenticate(KerberosAuthenticationHandler.java:94) 168 at AuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:82) 169 at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:507) 170 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 171 at org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(CrossOriginFilter.java:95) 172 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 173 at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) 174 at GzipFilter.doFilter(GzipFilter.java:188) 175 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 176 at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224) 177 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 178 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) 179 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 180 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) 181 at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) 182 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) 183 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) 184 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) 185 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) 186 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) 187 at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) 188 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) 189 at org.mortbay.jetty.Server.handle(Server.java:326) 190 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) 191 at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) 192 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) 193 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) 194 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) 195 at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) 196 at
[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505437#comment-14505437 ] Sunil G commented on YARN-3521: --- Recently have done few work in Rest. I wud like to take over, pls reassign otherwise. Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-3434: Attachment: YARN-3434.patch updated based on review comments Interaction between reservations and userlimit can result in significant ULF violation -- Key: YARN-3434 URL: https://issues.apache.org/jira/browse/YARN-3434 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch ULF was set to 1.0 User was able to consume 1.4X queue capacity. It looks like when this application launched, it reserved about 1000 containers, each 8G each, within about 5 seconds. I think this allowed the logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore
[ https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505299#comment-14505299 ] Vinod Kumar Vavilapalli commented on YARN-3410: --- Seems like this is going in first. If not, this should also take care of YARN-2268. YARN admin should be able to remove individual application records from RMStateStore Key: YARN-3410 URL: https://issues.apache.org/jira/browse/YARN-3410 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, yarn Reporter: Wangda Tan Assignee: Rohith Priority: Critical Attachments: 0001-YARN-3410-v1.patch, 0001-YARN-3410.patch, 0001-YARN-3410.patch, 0002-YARN-3410.patch, 0003-YARN-3410.patch, 0004-YARN-3410-branch-2.patch, 0004-YARN-3410.patch When RM state store entered an unexpected state, one example is YARN-2340, when an attempt is not in final state but app already completed, RM can never get up unless format RMStateStore. I think we should support remove individual application records from RMStateStore to unblock RM admin make choice of either waiting for a fix or format state store. In addition, RM should be able to report all fatal errors (which will shutdown RM) when doing app recovery, this can save admin some time to remove apps in bad state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3413: -- Summary: Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime (was: Node label attributes (like exclusive or not) should be able to set when addToClusterNodeLabels and shouldn't be changed during runtime) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3519) registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched
[ https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505378#comment-14505378 ] Jian He commented on YARN-3519: --- [~sandflee], is this the issue that AM could re-register with RM before containers are actually recovered in RM? This is a known issue which is tracked down at YARN-2038. registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched Key: YARN-3519 URL: https://issues.apache.org/jira/browse/YARN-3519 Project: Hadoop YARN Issue Type: Bug Reporter: sandflee 1, rm failes over, have recovered all app info but all container 2, am relaunched and register to am 3, nm with container launched by am reregister to rm The container in nm and corresponding NMToken could not passed to am -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
Wangda Tan created YARN-3521: Summary: Support return structured NodeLabel objects in REST API when call getClusterNodeLabels Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3482) Report NM available resources in heartbeat
[ https://issues.apache.org/jira/browse/YARN-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505476#comment-14505476 ] Inigo Goiri commented on YARN-3482: --- [~grey], the ultimate target of this task is to provide an interface for external applications to change the amount of available resources in a node. A part of YARN-3332 targets a smarter way of calculating the amount of resources available to an NM, this can be somewhat related but I think this effort is still needed. Anyway, thanks for the pointer as I'm targetting some of the sub-tasks described in that task. Report NM available resources in heartbeat -- Key: YARN-3482 URL: https://issues.apache.org/jira/browse/YARN-3482 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Original Estimate: 504h Remaining Estimate: 504h NMs are usually collocated with other processes like HDFS, Impala or HBase. To manage this scenario correctly, YARN should be aware of the actual available resources. The proposal is to have an interface to dynamically change the available resources and report this to the RM in every heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505322#comment-14505322 ] Xuan Gong commented on YARN-2268: - bq. If an active RM creates a I am using the state-store lock-file, then the command can bail out. Similarly, the command can create a I am blowing up the state-store while you were presumably away, so that RM can crash deterministically when a format is in progress. +1 for the proposal. This is probably the simplest way to fix the issue. Disallow formatting the RMStateStore when there is an RM running Key: YARN-2268 URL: https://issues.apache.org/jira/browse/YARN-2268 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Rohith Attachments: 0001-YARN-2268.patch YARN-2131 adds a way to format the RMStateStore. However, it can be a problem if we format the store while an RM is actively using it. It would be nice to fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505382#comment-14505382 ] Sangjin Lee commented on YARN-3437: --- I think we need to make progress on this as this is blocking other JIRAs and also it's tied to the schema evaluation. My vote is to get this committed, and adjust this once YARN-2556 lands and we rebase. Thoughts? convert load test driver to timeline service v.2 Key: YARN-3437 URL: https://issues.apache.org/jira/browse/YARN-3437 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3437.001.patch, YARN-3437.002.patch This subtask covers the work for converting the proposed patch for the load test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3468) NM should not blindly rename usercache/filecache/nmPrivate on restart
[ https://issues.apache.org/jira/browse/YARN-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3468: -- Attachment: YARN-3468.v2.patch NM should not blindly rename usercache/filecache/nmPrivate on restart - Key: YARN-3468 URL: https://issues.apache.org/jira/browse/YARN-3468 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3468.v1.patch, YARN-3468.v2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505296#comment-14505296 ] Vinod Kumar Vavilapalli commented on YARN-2268: --- Further, this should also take care of YARN-3410, whichever patch goes in first. Disallow formatting the RMStateStore when there is an RM running Key: YARN-2268 URL: https://issues.apache.org/jira/browse/YARN-2268 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Rohith Attachments: 0001-YARN-2268.patch YARN-2131 adds a way to format the RMStateStore. However, it can be a problem if we format the store while an RM is actively using it. It would be nice to fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3516) killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status.
[ https://issues.apache.org/jira/browse/YARN-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3516: Attachment: YARN-3516.000.patch killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status. --- Key: YARN-3516 URL: https://issues.apache.org/jira/browse/YARN-3516 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3516.000.patch killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status. This is a typo from YARN-3024. With YARN-3024, ContainerLocalizer will be killed only if {{action}} is set to {{LocalizerAction.DIE}}, calling {{response.setLocalizerAction}} will be overwritten. This is also a regression from old code. Also it make sense to kill the ContainerLocalizer when FETCH_FAILURE happened, because the container will send CLEANUP_CONTAINER_RESOURCES event after localization failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3413: - Attachment: YARN-3413.3.patch Thanks for review, [~vinodkv]: bq. We should simply force the labelName to follow a ( ) block - i.e. anything next to a comma upto the left parenthesis is a label. Right? I think we need to support both, if we enforce this, it will be a imcompatible behavior Other comments are all addressed. In addition: - Make GetClusterNodeLabelResponse returns NodeLabel instead of String. - Filed YARN-3521 to track REST API changes. Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505501#comment-14505501 ] Hadoop QA commented on YARN-3517: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726919/YARN-3517.003.patch against trunk revision 8ddbb8d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7427//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7427//console This message is automatically generated. RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Labels: security Attachments: YARN-3517.001.patch, YARN-3517.002.patch, YARN-3517.003.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels
[ https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505503#comment-14505503 ] Wangda Tan commented on YARN-3521: -- [~sunilg], thanks for taking this, it's yours :) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels -- Key: YARN-3521 URL: https://issues.apache.org/jira/browse/YARN-3521 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should make the same change in REST API side to make them consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3519) registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched
sandflee created YARN-3519: -- Summary: registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched Key: YARN-3519 URL: https://issues.apache.org/jira/browse/YARN-3519 Project: Hadoop YARN Issue Type: Bug Reporter: sandflee 1, rm failes over, have recovered all app info but all container 2, am relaunched and register to am 3, nm with container launched by am reregister to rm The container in nm and corresponding NMToken could not passed to am -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504813#comment-14504813 ] Hudson commented on YARN-3463: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #161 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/161/]) YARN-3463. Integrate OrderingPolicy Framework with CapacityScheduler. (Craig Welch via wangda) (wangda: rev 44872b76fcc0ddfbc7b0a4e54eef50fe8708e0f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.8.0 Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch, YARN-3463.68.patch, YARN-3463.69.patch, YARN-3463.70.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3497) ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy
[ https://issues.apache.org/jira/browse/YARN-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504815#comment-14504815 ] Hudson commented on YARN-3497: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #161 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/161/]) YARN-3497. ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy. Contributed by Jason Lowe (jianhe: rev f967fd2f21791c5c4a5a090cc14ee88d155d2e2b) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy Key: YARN-3497 URL: https://issues.apache.org/jira/browse/YARN-3497 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.7.1 Attachments: YARN-3497.001.patch, YARN-3497.002.patch yarn-client's ContainerManagementProtocolProxy is updating ipc.client.connection.maxidletime in the conf passed in without making a copy of it. That modification leaks into other systems using the same conf and can cause them to setup RPC connections with a timeout of zero as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId
[ https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3445: - Description: Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add cache for runningApps in RMNode, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. (was: Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add additional field for running apps in NM heartbeat request, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. ) Cache runningApps in RMNode for getting running apps on given NodeId Key: YARN-3445 URL: https://issues.apache.org/jira/browse/YARN-3445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3445.patch Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add cache for runningApps in RMNode, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3406) Add a Running Container for RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504928#comment-14504928 ] Hadoop QA commented on YARN-3406: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726857/YARN-3406.2.patch against trunk revision 8ddbb8d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7421//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7421//console This message is automatically generated. Add a Running Container for RM Web UI - Key: YARN-3406 URL: https://issues.apache.org/jira/browse/YARN-3406 Project: Hadoop YARN Issue Type: Improvement Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3406.1.patch, YARN-3406.2.patch, screenshot.png, screenshot2.png View the number of containers in the all application list. And, add REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures
[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504521#comment-14504521 ] Hadoop QA commented on YARN-3514: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726815/YARN-3514.001.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7419//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7419//console This message is automatically generated. Active directory usernames like domain\login cause YARN failures Key: YARN-3514 URL: https://issues.apache.org/jira/browse/YARN-3514 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: CentOS6 Reporter: john lilley Assignee: Chris Nauroth Priority: Minor Attachments: YARN-3514.001.patch We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is Kerberos-enabled and uses an external AD domain controller for the KDC. We are able to authenticate, browse HDFS, etc. However, YARN fails during localization because it seems to get confused by the presence of a \ character in the local user name. Our AD authentication on the nodes goes through sssd and set configured to map AD users onto the form domain\username. For example, our test user has a Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user domain\hadoopuser. We have no problem validating that user with PAM, logging in as that user, su-ing to that user, etc. However, when we attempt to run a YARN application master, the localization step fails when setting up the local cache directory for the AM. The error that comes out of the RM logs: 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, diagnostics='Application application_1429295486450_0001 failed 1 times due to AM Container for appattempt_1429295486450_0001_01 exited with exitCode: -1000 due to: Application application_1429295486450_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : user is DOMAIN\hadoopuser main : requested yarn user is domain\hadoopuser org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) .Failing this attempt.. Failing the application.' However, when we look on the node launching the AM, we see this: [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache [root@rpb-cdh-kerb-2 usercache]# ls -l drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser There appears to be different treatment of the \ character in different places. Something creates the directory as domain\hadoopuser but something else later attempts to use it as domain%5Chadoopuser. I’m not sure where or why the URL escapement converts the \ to
[jira] [Assigned] (YARN-3484) Fix up yarn top shell code
[ https://issues.apache.org/jira/browse/YARN-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev reassigned YARN-3484: --- Assignee: Varun Vasudev Fix up yarn top shell code -- Key: YARN-3484 URL: https://issues.apache.org/jira/browse/YARN-3484 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Varun Vasudev We need to do some work on yarn top's shell code. a) Just checking for TERM isn't good enough. We really need to check the return on tput, especially since the output will not be a number but an error string which will likely blow up the java code in horrible ways. b) All the single bracket tests should be double brackets to force the bash built-in. c) I'd think I'd rather see the shell portion in a function since it's rather large. This will allow for args, etc, to get local'ized and clean up the case statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3484) Fix up yarn top shell code
[ https://issues.apache.org/jira/browse/YARN-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3484: Attachment: YARN-3484.001.patch Allen, I've uploaded a patch to address your comments. Can you please review? If it looks good to you, I'll upload a version for branch-2. Fix up yarn top shell code -- Key: YARN-3484 URL: https://issues.apache.org/jira/browse/YARN-3484 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Varun Vasudev Attachments: YARN-3484.001.patch We need to do some work on yarn top's shell code. a) Just checking for TERM isn't good enough. We really need to check the return on tput, especially since the output will not be a number but an error string which will likely blow up the java code in horrible ways. b) All the single bracket tests should be double brackets to force the bash built-in. c) I'd think I'd rather see the shell portion in a function since it's rather large. This will allow for args, etc, to get local'ized and clean up the case statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504613#comment-14504613 ] Varun Vasudev commented on YARN-3294: - [~tgraves] - thanks for pointing out the admin issue. My apologies for missing it. I've filed YARN-3517 and updated it with a patch, which allows only admins to use the functionality. Can you please review and leave comments there? Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.8.0 Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, apache-yarn-3294.3.patch, apache-yarn-3294.4.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3046) [Event producers] Implement MapReduce AM writing MR events to v2 ATS
[ https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504712#comment-14504712 ] Junping Du commented on YARN-3046: -- Thanks [~zjshen] and [~sjlee0] for review and comments! bq. createEntity's comments need to be updated to reflect the latest code changes. Nice catch! Updated in v6 patch. bq. The following code does nothing, and can be removed. The only thing it does is get rid of falling into default (for unrecognized event) where it will be return directly. Let's keep it here. bq. OK, this is another existing bug I'll leave it up to you to decide whether we want to fix this in ATS v.1 in a separate JIRA. Let's fix in refactor patch (MAPREDUCE-6318) given we already roll back a prefix tiny bug on v1. bq. I.150: same issue Good catch! Fix it in v6 patch. [Event producers] Implement MapReduce AM writing MR events to v2 ATS Key: YARN-3046 URL: https://issues.apache.org/jira/browse/YARN-3046 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch, YARN-3046-v1-rebase.patch, YARN-3046-v1.patch, YARN-3046-v2.patch, YARN-3046-v3.patch, YARN-3046-v4.patch, YARN-3046-v5.patch Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3046) [Event producers] Implement MapReduce AM writing MR events to v2 ATS
[ https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3046: - Attachment: YARN-3046-v6.patch [Event producers] Implement MapReduce AM writing MR events to v2 ATS Key: YARN-3046 URL: https://issues.apache.org/jira/browse/YARN-3046 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch, YARN-3046-v1-rebase.patch, YARN-3046-v1.patch, YARN-3046-v2.patch, YARN-3046-v3.patch, YARN-3046-v4.patch, YARN-3046-v5.patch, YARN-3046-v6.patch Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3518) default rm/am expire interval should less than default resourcemanager connect wait time
sandflee created YARN-3518: -- Summary: default rm/am expire interval should less than default resourcemanager connect wait time Key: YARN-3518 URL: https://issues.apache.org/jira/browse/YARN-3518 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Reporter: sandflee take am for example, if am can't connect to RM, after am expire (600s), RM relaunch am, and there will be two am at the same time util resourcemanager connect max wait time(900s) passed. DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS = 15 * 60 * 1000; DEFAULT_RM_AM_EXPIRY_INTERVAL_MS = 60; DEFAULT_RM_NM_EXPIRY_INTERVAL_MS = 60; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3516) killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status.
[ https://issues.apache.org/jira/browse/YARN-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504513#comment-14504513 ] Hadoop QA commented on YARN-3516: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726776/YARN-3516.000.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDataTransferProtocol The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestLeaseRecovery2 Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7415//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7415//console This message is automatically generated. killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status. --- Key: YARN-3516 URL: https://issues.apache.org/jira/browse/YARN-3516 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3516.000.patch killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status. This is a typo from YARN-3024. With YARN-3024, ContainerLocalizer will be killed only if {{action}} is set to {{LocalizerAction.DIE}}, calling {{response.setLocalizerAction}} will be overwritten. This is also a regression from old code. Also it make sense to kill the ContainerLocalizer when FETCH_FAILURE happened, because the container will send CLEANUP_CONTAINER_RESOURCES event after localization failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504554#comment-14504554 ] Hadoop QA commented on YARN-3517: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726810/YARN-3517.001.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7418//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7418//console This message is automatically generated. RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Labels: security Attachments: YARN-3517.001.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504617#comment-14504617 ] Varun Vasudev commented on YARN-3517: - Test failure is unrelated. RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Labels: security Attachments: YARN-3517.001.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3484) Fix up yarn top shell code
[ https://issues.apache.org/jira/browse/YARN-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504660#comment-14504660 ] Hadoop QA commented on YARN-3484: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726827/YARN-3484.001.patch against trunk revision d52de61. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7420//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7420//console This message is automatically generated. Fix up yarn top shell code -- Key: YARN-3484 URL: https://issues.apache.org/jira/browse/YARN-3484 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Varun Vasudev Attachments: YARN-3484.001.patch We need to do some work on yarn top's shell code. a) Just checking for TERM isn't good enough. We really need to check the return on tput, especially since the output will not be a number but an error string which will likely blow up the java code in horrible ways. b) All the single bracket tests should be double brackets to force the bash built-in. c) I'd think I'd rather see the shell portion in a function since it's rather large. This will allow for args, etc, to get local'ized and clean up the case statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3511) Add errors and warnings page to ATS
[ https://issues.apache.org/jira/browse/YARN-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504621#comment-14504621 ] Varun Vasudev commented on YARN-3511: - The eclipse target passes on my machine. I'm not sure if we can add any tests for this. Add errors and warnings page to ATS --- Key: YARN-3511 URL: https://issues.apache.org/jira/browse/YARN-3511 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-3511.001.patch YARN-2901 adds the capability to view errors and warnings on the web UI. The ATS was missed out. Add support for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505077#comment-14505077 ] Hudson commented on YARN-3463: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #171 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/171/]) YARN-3463. Integrate OrderingPolicy Framework with CapacityScheduler. (Craig Welch via wangda) (wangda: rev 44872b76fcc0ddfbc7b0a4e54eef50fe8708e0f5) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.8.0 Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch, YARN-3463.68.patch, YARN-3463.69.patch, YARN-3463.70.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3497) ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy
[ https://issues.apache.org/jira/browse/YARN-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505079#comment-14505079 ] Hudson commented on YARN-3497: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #171 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/171/]) YARN-3497. ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy. Contributed by Jason Lowe (jianhe: rev f967fd2f21791c5c4a5a090cc14ee88d155d2e2b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java * hadoop-yarn-project/CHANGES.txt ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy Key: YARN-3497 URL: https://issues.apache.org/jira/browse/YARN-3497 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.7.1 Attachments: YARN-3497.001.patch, YARN-3497.002.patch yarn-client's ContainerManagementProtocolProxy is updating ipc.client.connection.maxidletime in the conf passed in without making a copy of it. That modification leaks into other systems using the same conf and can cause them to setup RPC connections with a timeout of zero as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics
[ https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504958#comment-14504958 ] Hadoop QA commented on YARN-3494: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726861/0002-YARN-3494.patch against trunk revision 8ddbb8d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7423//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7423//console This message is automatically generated. Expose AM resource limit and user limit in QueueMetrics Key: YARN-3494 URL: https://issues.apache.org/jira/browse/YARN-3494 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3494.patch, 0002-YARN-3494.patch, 0002-YARN-3494.patch Now we have the AM resource limit and user limit shown on the web UI, it would be useful to expose them in the QueueMetrics as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504954#comment-14504954 ] Thomas Graves commented on YARN-3517: - Thanks for following up on this. Could you also change it to not show the button if you aren't an admin? I don't want to confuse users by having a button there that doesn't do anything. One other thing is could you add some css or something to make it look more like a button. Right now it just looks like text and I didn't know it was clickable at first. The placement of it seems a bit weird to me also but as along as its only showing up for admins that is less of an issue. I haven't looked at the patch if details but I see we are creating a new AdminACLsManager each time. It would be nice if we didn't have to do that. RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Labels: security Attachments: YARN-3517.001.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3517: Attachment: YARN-3517.002.patch Uploaded a new patch to address Thomas's comments. bq. Could you also change it to not show the button if you aren't an admin? Fixed. {quote} One other thing is could you add some css or something to make it look more like a button. Right now it just looks like text and I didn't know it was clickable at first. The placement of it seems a bit weird to me also but as along as its only showing up for admins that is less of an issue. {quote} I've added some style elements to make it look better. {quote} I haven't looked at the patch if details but I see we are creating a new AdminACLsManager each time. It would be nice if we didn't have to do that. {quote} Fixed. RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Labels: security Attachments: YARN-3517.001.patch, YARN-3517.002.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504842#comment-14504842 ] Junping Du commented on YARN-3437: -- Filed a new JIRA to track fine-grained performance data sounds good to me. Given the patch here have duplicated code with YARN-2556, I would like to understand what's our plan for YARN-2556. [~jeagles], can you share your vision on this? Looks like this JIRA block YARN-3390 (a refactor JIRA) which block YARN-3044 (RM writing events to v2 ATS service). I would like to have a clear path to make all patches goes in as a pipeline with getting ride of any potential deadlock. :) May be the first step is to make YARN-2556 get committed it, and get patch here rebased? [~jeagles], [~sjlee0] and [~zjshen], what's your opinion on this? convert load test driver to timeline service v.2 Key: YARN-3437 URL: https://issues.apache.org/jira/browse/YARN-3437 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-3437.001.patch, YARN-3437.002.patch This subtask covers the work for converting the proposed patch for the load test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures
[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504869#comment-14504869 ] john lilley commented on YARN-3514: --- Thank you! I am very impressed with the short time it took to patch. Active directory usernames like domain\login cause YARN failures Key: YARN-3514 URL: https://issues.apache.org/jira/browse/YARN-3514 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: CentOS6 Reporter: john lilley Assignee: Chris Nauroth Priority: Minor Attachments: YARN-3514.001.patch We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is Kerberos-enabled and uses an external AD domain controller for the KDC. We are able to authenticate, browse HDFS, etc. However, YARN fails during localization because it seems to get confused by the presence of a \ character in the local user name. Our AD authentication on the nodes goes through sssd and set configured to map AD users onto the form domain\username. For example, our test user has a Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user domain\hadoopuser. We have no problem validating that user with PAM, logging in as that user, su-ing to that user, etc. However, when we attempt to run a YARN application master, the localization step fails when setting up the local cache directory for the AM. The error that comes out of the RM logs: 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, diagnostics='Application application_1429295486450_0001 failed 1 times due to AM Container for appattempt_1429295486450_0001_01 exited with exitCode: -1000 due to: Application application_1429295486450_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : user is DOMAIN\hadoopuser main : requested yarn user is domain\hadoopuser org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) .Failing this attempt.. Failing the application.' However, when we look on the node launching the AM, we see this: [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache [root@rpb-cdh-kerb-2 usercache]# ls -l drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser There appears to be different treatment of the \ character in different places. Something creates the directory as domain\hadoopuser but something else later attempts to use it as domain%5Chadoopuser. I’m not sure where or why the URL escapement converts the \ to %5C or why this is not consistent. I should also mention, for the sake of completeness, our auth_to_local rule is set up to map u...@domain.com to domain\user: RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3497) ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy
[ https://issues.apache.org/jira/browse/YARN-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504748#comment-14504748 ] Hudson commented on YARN-3497: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #170 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/170/]) YARN-3497. ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy. Contributed by Jason Lowe (jianhe: rev f967fd2f21791c5c4a5a090cc14ee88d155d2e2b) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy Key: YARN-3497 URL: https://issues.apache.org/jira/browse/YARN-3497 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.7.1 Attachments: YARN-3497.001.patch, YARN-3497.002.patch yarn-client's ContainerManagementProtocolProxy is updating ipc.client.connection.maxidletime in the conf passed in without making a copy of it. That modification leaks into other systems using the same conf and can cause them to setup RPC connections with a timeout of zero as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504746#comment-14504746 ] Hudson commented on YARN-3463: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #170 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/170/]) YARN-3463. Integrate OrderingPolicy Framework with CapacityScheduler. (Craig Welch via wangda) (wangda: rev 44872b76fcc0ddfbc7b0a4e54eef50fe8708e0f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.8.0 Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch, YARN-3463.68.patch, YARN-3463.69.patch, YARN-3463.70.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3445) Cache runningApps in RMNode for getting running apps on given NodeId
[ https://issues.apache.org/jira/browse/YARN-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3445: - Summary: Cache runningApps in RMNode for getting running apps on given NodeId (was: NM notify RM on running Apps in NM-RM heartbeat) Cache runningApps in RMNode for getting running apps on given NodeId Key: YARN-3445 URL: https://issues.apache.org/jira/browse/YARN-3445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3445.patch Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add additional field for running apps in NM heartbeat request, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3406) Add a Running Container for RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryu Kobayashi updated YARN-3406: Attachment: YARN-3406.2.patch [~ozawa] I fixed the conflict. And, I was changed to the label name Runngin Containers (same as the REST API). Also, I fixed because there was a bug in the sort of fair scheduler. Add a Running Container for RM Web UI - Key: YARN-3406 URL: https://issues.apache.org/jira/browse/YARN-3406 Project: Hadoop YARN Issue Type: Improvement Reporter: Ryu Kobayashi Assignee: Ryu Kobayashi Priority: Minor Attachments: YARN-3406.1.patch, YARN-3406.2.patch, screenshot.png, screenshot2.png View the number of containers in the all application list. And, add REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics
[ https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504830#comment-14504830 ] Hadoop QA commented on YARN-3494: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726858/0002-YARN-3494.patch against trunk revision 8ddbb8d. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7422//console This message is automatically generated. Expose AM resource limit and user limit in QueueMetrics Key: YARN-3494 URL: https://issues.apache.org/jira/browse/YARN-3494 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3494.patch, 0002-YARN-3494.patch Now we have the AM resource limit and user limit shown on the web UI, it would be useful to expose them in the QueueMetrics as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics
[ https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3494: - Attachment: 0002-YARN-3494.patch Expose AM resource limit and user limit in QueueMetrics Key: YARN-3494 URL: https://issues.apache.org/jira/browse/YARN-3494 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3494.patch, 0002-YARN-3494.patch, 0002-YARN-3494.patch Now we have the AM resource limit and user limit shown on the web UI, it would be useful to expose them in the QueueMetrics as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3517) RM web ui for dumping scheduler logs should be for admins only
[ https://issues.apache.org/jira/browse/YARN-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505185#comment-14505185 ] Hadoop QA commented on YARN-3517: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726890/YARN-3517.002.patch against trunk revision 8ddbb8d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7424//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7424//console This message is automatically generated. RM web ui for dumping scheduler logs should be for admins only -- Key: YARN-3517 URL: https://issues.apache.org/jira/browse/YARN-3517 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, security Affects Versions: 2.7.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Labels: security Attachments: YARN-3517.001.patch, YARN-3517.002.patch YARN-3294 allows users to dump scheduler logs from the web UI. This should be for admins only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505129#comment-14505129 ] Hudson commented on YARN-3463: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2120 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2120/]) YARN-3463. Integrate OrderingPolicy Framework with CapacityScheduler. (Craig Welch via wangda) (wangda: rev 44872b76fcc0ddfbc7b0a4e54eef50fe8708e0f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java * hadoop-yarn-project/CHANGES.txt Integrate OrderingPolicy Framework with CapacityScheduler - Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.8.0 Attachments: YARN-3463.50.patch, YARN-3463.61.patch, YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch, YARN-3463.68.patch, YARN-3463.69.patch, YARN-3463.70.patch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3497) ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy
[ https://issues.apache.org/jira/browse/YARN-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505131#comment-14505131 ] Hudson commented on YARN-3497: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2120 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2120/]) YARN-3497. ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy. Contributed by Jason Lowe (jianhe: rev f967fd2f21791c5c4a5a090cc14ee88d155d2e2b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java * hadoop-yarn-project/CHANGES.txt ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy Key: YARN-3497 URL: https://issues.apache.org/jira/browse/YARN-3497 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.7.1 Attachments: YARN-3497.001.patch, YARN-3497.002.patch yarn-client's ContainerManagementProtocolProxy is updating ipc.client.connection.maxidletime in the conf passed in without making a copy of it. That modification leaks into other systems using the same conf and can cause them to setup RPC connections with a timeout of zero as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures
[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505644#comment-14505644 ] john lilley commented on YARN-3514: --- We did work around the issue by changing our username mapping in sssd and auth_to_local rules to use plain usernames, that seemed to be the path of least resistance. Active directory usernames like domain\login cause YARN failures Key: YARN-3514 URL: https://issues.apache.org/jira/browse/YARN-3514 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: CentOS6 Reporter: john lilley Assignee: Chris Nauroth Priority: Minor Attachments: YARN-3514.001.patch, YARN-3514.002.patch We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is Kerberos-enabled and uses an external AD domain controller for the KDC. We are able to authenticate, browse HDFS, etc. However, YARN fails during localization because it seems to get confused by the presence of a \ character in the local user name. Our AD authentication on the nodes goes through sssd and set configured to map AD users onto the form domain\username. For example, our test user has a Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user domain\hadoopuser. We have no problem validating that user with PAM, logging in as that user, su-ing to that user, etc. However, when we attempt to run a YARN application master, the localization step fails when setting up the local cache directory for the AM. The error that comes out of the RM logs: 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, diagnostics='Application application_1429295486450_0001 failed 1 times due to AM Container for appattempt_1429295486450_0001_01 exited with exitCode: -1000 due to: Application application_1429295486450_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : user is DOMAIN\hadoopuser main : requested yarn user is domain\hadoopuser org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) .Failing this attempt.. Failing the application.' However, when we look on the node launching the AM, we see this: [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache [root@rpb-cdh-kerb-2 usercache]# ls -l drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser There appears to be different treatment of the \ character in different places. Something creates the directory as domain\hadoopuser but something else later attempts to use it as domain%5Chadoopuser. I’m not sure where or why the URL escapement converts the \ to %5C or why this is not consistent. I should also mention, for the sake of completeness, our auth_to_local rule is set up to map u...@domain.com to domain\user: RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3495) Confusing log generated by FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505754#comment-14505754 ] Hudson commented on YARN-3495: -- FAILURE: Integrated in Hadoop-trunk-Commit #7627 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7627/]) YARN-3495. Confusing log generated by FairScheduler. Contributed by Brahma Reddy Battula. (ozawa: rev 105afd54779852c518b978101f23526143e234a5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt Confusing log generated by FairScheduler Key: YARN-3495 URL: https://issues.apache.org/jira/browse/YARN-3495 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: YARN-3495.patch 2015-04-16 12:03:48,531 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505709#comment-14505709 ] Wangda Tan commented on YARN-2740: -- General LGTM, some minor comments. 1) Mark YarnConfiguration.isDistributedNodeLabelConfiguration to @Private 2) It's better to cover remove label case since remove label = remove label in the cluster + remove label in nodes, add a test to make sure it works in distributed mode, same as TestRMAdminService/TestRMWebServicesNodeLabels 3) RMWebServices.replaceLabelsOnNode(s) should be merged to avoid we need to maintain both. ResourceManager side should properly handle node label modifications when distributed node label configuration enabled -- Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, YARN-2740.20150421-1.patch According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505832#comment-14505832 ] Jonathan Eagles commented on YARN-2556: --- I'm very swamped current. Even though it would take very little time to address this, I just can't find the time. Please let's just move on and I will get to this in time. Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Chang Li Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-3225: Attachment: YARN-3225-5.patch New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, YARN-3225-4.patch, YARN-3225-5.patch, YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures
[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505661#comment-14505661 ] Hadoop QA commented on YARN-3514: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726964/YARN-3514.002.patch against trunk revision 997408e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7431//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7431//console This message is automatically generated. Active directory usernames like domain\login cause YARN failures Key: YARN-3514 URL: https://issues.apache.org/jira/browse/YARN-3514 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: CentOS6 Reporter: john lilley Assignee: Chris Nauroth Priority: Minor Attachments: YARN-3514.001.patch, YARN-3514.002.patch We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is Kerberos-enabled and uses an external AD domain controller for the KDC. We are able to authenticate, browse HDFS, etc. However, YARN fails during localization because it seems to get confused by the presence of a \ character in the local user name. Our AD authentication on the nodes goes through sssd and set configured to map AD users onto the form domain\username. For example, our test user has a Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user domain\hadoopuser. We have no problem validating that user with PAM, logging in as that user, su-ing to that user, etc. However, when we attempt to run a YARN application master, the localization step fails when setting up the local cache directory for the AM. The error that comes out of the RM logs: 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, diagnostics='Application application_1429295486450_0001 failed 1 times due to AM Container for appattempt_1429295486450_0001_01 exited with exitCode: -1000 due to: Application application_1429295486450_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : user is DOMAIN\hadoopuser main : requested yarn user is domain\hadoopuser org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) .Failing this attempt.. Failing the application.' However, when we look on the node launching the AM, we see this: [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache [root@rpb-cdh-kerb-2 usercache]# ls -l drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser There appears to be different treatment of the \ character in different places. Something creates the directory as domain\hadoopuser but something else later attempts to use it as domain%5Chadoopuser. I’m not sure where or why the URL escapement converts the \ to %5C or why this is not consistent. I should also mention, for the sake of completeness, our
[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505891#comment-14505891 ] Naganarasimha G R commented on YARN-2740: - Thanks for the comment [~wangda] bq. 2) It's better to cover remove label case since remove label = remove label in the cluster + remove label in nodes, add a test to make sure it works in distributed mode, same as TestRMAdminService/TestRMWebServicesNodeLabels. IIUC you want {{prevent removing clusterNodeLabel while distributed enabled}} and add test cases for it ? bq. 3) RMWebServices.replaceLabelsOnNode(s) should be merged to avoid we need to maintain both. I dont mind working on it here but 2 queries * Is it that we should still support existing 2 rest api's for replaceLabelsOnNode on a single node and other for multiple nodes and ensure the common part is extracted to a new method ? * I understand that it will make it easy for committer to limit number of checkins but Is it good in terms of maintainability to include code changes in the patch which is not related to this jira description ? ResourceManager side should properly handle node label modifications when distributed node label configuration enabled -- Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, YARN-2740.20150417-1.patch, YARN-2740.20150420-1.patch, YARN-2740.20150421-1.patch According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3519) registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched
[ https://issues.apache.org/jira/browse/YARN-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505894#comment-14505894 ] sandflee commented on YARN-3519: yes, the same issue registerApplicationMaster couldn't get all running containers if rm is rebuilding container info while am is relaunched Key: YARN-3519 URL: https://issues.apache.org/jira/browse/YARN-3519 Project: Hadoop YARN Issue Type: Bug Reporter: sandflee 1, rm failes over, have recovered all app info but all container 2, am relaunched and register to am 3, nm with container launched by am reregister to rm The container in nm and corresponding NMToken could not passed to am -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3363) add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container.
[ https://issues.apache.org/jira/browse/YARN-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505744#comment-14505744 ] Anubhav Dhoot commented on YARN-3363: - The duration do not seem to belong in a ProcessTree class. Can we instead of flowing the durations through the ProcessTree class, add the metrics directly in the ContainersMonitorImpl#handle by reading the startEvent.getLaunchDuration() directly? Nit: ContainerMetrics:recordTime maybe could rename to recordStateChangeDurations or something to that effect? sendContainerMonitorStartEvent has two different ways duration were calculated. Maybe make 2 local variables for launchDuration and localizationDuration and then we do not need the comment. add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Key: YARN-3363 URL: https://issues.apache.org/jira/browse/YARN-3363 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu Labels: metrics, supportability Attachments: YARN-3363.000.patch add localization and container launch time to ContainerMetrics at NM to show these timing information for each active container. Currently ContainerMetrics has container's actual memory usage(YARN-2984), actual CPU usage(YARN-3122), resource and pid(YARN-3022). It will be better to have localization and container launch time in ContainerMetrics for each active container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3134: Attachment: YARN-3134-042115.patch In this patch I addressed [~djp]'s comments on code readability and releasing PreparedStatements. In this patch, PreparedStatements will be released after the try with resource statements. I've also addressed the concurrent modification exception pointed out by [~zjshen]. Now we're using per-thread Phoenix JDBC connections which will be mapped to the same heavy-weight HBase connection internally. [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Li Lu Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505910#comment-14505910 ] Zhijie Shen commented on YARN-3287: --- It breaks the timeline access control of distributed shell. In distributed shell AM: {code} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { // Creating the Timeline Client timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineClient.start(); } else { timelineClient = null; LOG.warn(Timeline service is not enabled); } {code} {code} ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() { @Override public TimelinePutResponse run() throws Exception { return timelineClient.putEntities(entity); } }); {code} This Jira changes the timeline client to get the right ugi at serviceInit, but DS AM still doesn't use submitter ugi to init timeline client, but use the ugi for each put entity call. It result in the wrong user of the put request. TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Fix For: 2.7.0 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3413: - Attachment: YARN-3413.4.patch Attached ver.4 addressed findbugs warnings and test failures (MR test failures seems not related to this patch). Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, YARN-3413.4.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures
[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505711#comment-14505711 ] Chris Nauroth commented on YARN-3514: - [~john.lil...@redpoint.net], thank you for the confirmation. Active directory usernames like domain\login cause YARN failures Key: YARN-3514 URL: https://issues.apache.org/jira/browse/YARN-3514 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: CentOS6 Reporter: john lilley Assignee: Chris Nauroth Priority: Minor Attachments: YARN-3514.001.patch, YARN-3514.002.patch We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is Kerberos-enabled and uses an external AD domain controller for the KDC. We are able to authenticate, browse HDFS, etc. However, YARN fails during localization because it seems to get confused by the presence of a \ character in the local user name. Our AD authentication on the nodes goes through sssd and set configured to map AD users onto the form domain\username. For example, our test user has a Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user domain\hadoopuser. We have no problem validating that user with PAM, logging in as that user, su-ing to that user, etc. However, when we attempt to run a YARN application master, the localization step fails when setting up the local cache directory for the AM. The error that comes out of the RM logs: 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, diagnostics='Application application_1429295486450_0001 failed 1 times due to AM Container for appattempt_1429295486450_0001_01 exited with exitCode: -1000 due to: Application application_1429295486450_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : user is DOMAIN\hadoopuser main : requested yarn user is domain\hadoopuser org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) .Failing this attempt.. Failing the application.' However, when we look on the node launching the AM, we see this: [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache [root@rpb-cdh-kerb-2 usercache]# ls -l drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser There appears to be different treatment of the \ character in different places. Something creates the directory as domain\hadoopuser but something else later attempts to use it as domain%5Chadoopuser. I’m not sure where or why the URL escapement converts the \ to %5C or why this is not consistent. I should also mention, for the sake of completeness, our auth_to_local rule is set up to map u...@domain.com to domain\user: RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures
[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505640#comment-14505640 ] john lilley commented on YARN-3514: --- Sadly, we aren't equipped to upgrade and patch, we are mandated to go with the flow of the commercial distros we support. However I can assure you that our local FS definitely supports the \ in the filename, as I saw the usercache folder with the \ in it. Active directory usernames like domain\login cause YARN failures Key: YARN-3514 URL: https://issues.apache.org/jira/browse/YARN-3514 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: CentOS6 Reporter: john lilley Assignee: Chris Nauroth Priority: Minor Attachments: YARN-3514.001.patch, YARN-3514.002.patch We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is Kerberos-enabled and uses an external AD domain controller for the KDC. We are able to authenticate, browse HDFS, etc. However, YARN fails during localization because it seems to get confused by the presence of a \ character in the local user name. Our AD authentication on the nodes goes through sssd and set configured to map AD users onto the form domain\username. For example, our test user has a Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user domain\hadoopuser. We have no problem validating that user with PAM, logging in as that user, su-ing to that user, etc. However, when we attempt to run a YARN application master, the localization step fails when setting up the local cache directory for the AM. The error that comes out of the RM logs: 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, diagnostics='Application application_1429295486450_0001 failed 1 times due to AM Container for appattempt_1429295486450_0001_01 exited with exitCode: -1000 due to: Application application_1429295486450_0001 initialization failed (exitCode=255) with output: main : command provided 0 main : user is DOMAIN\hadoopuser main : requested yarn user is domain\hadoopuser org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) .Failing this attempt.. Failing the application.' However, when we look on the node launching the AM, we see this: [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache [root@rpb-cdh-kerb-2 usercache]# ls -l drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser There appears to be different treatment of the \ character in different places. Something creates the directory as domain\hadoopuser but something else later attempts to use it as domain%5Chadoopuser. I’m not sure where or why the URL escapement converts the \ to %5C or why this is not consistent. I should also mention, for the sake of completeness, our auth_to_local rule is set up to map u...@domain.com to domain\user: RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505815#comment-14505815 ] Hadoop QA commented on YARN-3413: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726935/YARN-3413.3.patch against trunk revision dfc1c4c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 19 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.TestReporter org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath org.apache.hadoop.mapred.TestMRIntermediateDataEncryption org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7429//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7429//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7429//console This message is automatically generated. Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime -- Key: YARN-3413 URL: https://issues.apache.org/jira/browse/YARN-3413 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch As mentioned in : https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. Changing node label exclusivity and/or other attributes may not be a real use case, and also we should support setting node label attributes whiling adding them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3522) DistributedShell uses the wrong user to put timeline data
Zhijie Shen created YARN-3522: - Summary: DistributedShell uses the wrong user to put timeline data Key: YARN-3522 URL: https://issues.apache.org/jira/browse/YARN-3522 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker YARN-3287 breaks the timeline access control of distributed shell. In distributed shell AM: {code} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { // Creating the Timeline Client timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineClient.start(); } else { timelineClient = null; LOG.warn(Timeline service is not enabled); } {code} {code} ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() { @Override public TimelinePutResponse run() throws Exception { return timelineClient.putEntities(entity); } }); {code} YARN-3287 changes the timeline client to get the right ugi at serviceInit, but DS AM still doesn't use submitter ugi to init timeline client, but use the ugi for each put entity call. It result in the wrong user of the put request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience
[ https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan moved MAPREDUCE-6326 to YARN-3523: - Component/s: (was: resourcemanager) (was: client) resourcemanager client Key: YARN-3523 (was: MAPREDUCE-6326) Project: Hadoop YARN (was: Hadoop Map/Reduce) Cleanup ResourceManagerAdministrationProtocol interface audience Key: YARN-3523 URL: https://issues.apache.org/jira/browse/YARN-3523 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Reporter: Wangda Tan I noticed ResourceManagerAdministrationProtocol has @Private audience for the class and @Public audience for methods. It doesn't make sense to me. We should make class audience and methods audience consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience
[ https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505971#comment-14505971 ] Naganarasimha G R commented on YARN-3523: - Hi [~wangda], IIUC we need to make all the methods to have @Private audience right, as all methods over here are for administrative purpose ? if you already have a patch for this, please feel free to reassign :) Cleanup ResourceManagerAdministrationProtocol interface audience Key: YARN-3523 URL: https://issues.apache.org/jira/browse/YARN-3523 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R I noticed ResourceManagerAdministrationProtocol has @Private audience for the class and @Public audience for methods. It doesn't make sense to me. We should make class audience and methods audience consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3495) Confusing log generated by FairScheduler
[ https://issues.apache.org/jira/browse/YARN-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506044#comment-14506044 ] Brahma Reddy Battula commented on YARN-3495: Thanks a lot [~ozawa] for reviewing and committing patch!!! Confusing log generated by FairScheduler Key: YARN-3495 URL: https://issues.apache.org/jira/browse/YARN-3495 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.8.0 Attachments: YARN-3495.patch 2015-04-16 12:03:48,531 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI
[ https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506121#comment-14506121 ] Xuan Gong commented on YARN-3301: - bq. seems the outstanding Resource Requests table still has some format issue, Attached a new patch and a screenshot of the web page. Fix the format issue of the new RM web UI and AHS web UI Key: YARN-3301 URL: https://issues.apache.org/jira/browse/YARN-3301 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: Screen Shot 2015-04-21 at 5.09.25 PM.png, Screen Shot 2015-04-21 at 5.38.39 PM.png, YARN-3301.1.patch, YARN-3301.2.patch, YARN-3301.3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience
[ https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506118#comment-14506118 ] Wangda Tan commented on YARN-3523: -- [~Naganarasimha], thanks for taking this. Actually I'm not sure about what should be the correct audience setting for ResourceManagerAdministrationProtocol, as a public API, it should be @Public. But actually, it will only used by RMAdminCLI, I don't know is there any 3rd party projects writing their own admin CLI and implements ResourceManagerAdministrationProtocol. To not break compatibility, I think simple solution is to only change ResourceManagerAdministrationProtocol from @Private to @Public. I don't know if there's any thoughts about this. Cleanup ResourceManagerAdministrationProtocol interface audience Key: YARN-3523 URL: https://issues.apache.org/jira/browse/YARN-3523 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R I noticed ResourceManagerAdministrationProtocol has @Private audience for the class and @Public audience for methods. It doesn't make sense to me. We should make class audience and methods audience consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI
[ https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3301: Attachment: YARN-3301.3.patch Fix the format issue of the new RM web UI and AHS web UI Key: YARN-3301 URL: https://issues.apache.org/jira/browse/YARN-3301 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: Screen Shot 2015-04-21 at 5.09.25 PM.png, Screen Shot 2015-04-21 at 5.38.39 PM.png, YARN-3301.1.patch, YARN-3301.2.patch, YARN-3301.3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3301) Fix the format issue of the new RM web UI and AHS web UI
[ https://issues.apache.org/jira/browse/YARN-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3301: Attachment: Screen Shot 2015-04-21 at 5.38.39 PM.png Fix the format issue of the new RM web UI and AHS web UI Key: YARN-3301 URL: https://issues.apache.org/jira/browse/YARN-3301 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: Screen Shot 2015-04-21 at 5.09.25 PM.png, Screen Shot 2015-04-21 at 5.38.39 PM.png, YARN-3301.1.patch, YARN-3301.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)