[jira] [Created] (YARN-10394) RACK/NODE_LOCAL Request have same nodelabel as ANY Request
chan created YARN-10394: --- Summary: RACK/NODE_LOCAL Request have same nodelabel as ANY Request Key: YARN-10394 URL: https://issues.apache.org/jira/browse/YARN-10394 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.9.2 Environment: {code:java} //代码占位符 private void updateNodeLabels(ResourceRequest request) { String resourceName = request.getResourceName(); if (resourceName.equals(ResourceRequest.ANY)) { ResourceRequest previousAnyRequest = getResourceRequest(resourceName); // When there is change in ANY request label expression, we should // update label for all resource requests already added of same // priority as ANY resource request. if ((null == previousAnyRequest) || hasRequestLabelChanged( previousAnyRequest, request)) { for (ResourceRequest r : resourceRequestMap.values()) { if (!r.getResourceName().equals(ResourceRequest.ANY)) { r.setNodeLabelExpression(request.getNodeLabelExpression()); } } } } else{ // if resource Name is not ANY its nodeLabel will be same as ANY Request ResourceRequest anyRequest = getResourceRequest(ResourceRequest.ANY); if (anyRequest != null) { request.setNodeLabelExpression(anyRequest.getNodeLabelExpression()); } } } {code} Reporter: chan LocalitySchedulingPlacementSet.updateNodeLabels make RACK/NODE_LOCAL Request have same nodelabel as ANY Request instead of -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
[ https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175181#comment-17175181 ] Juanjuan Tian commented on YARN-10384: --- [~epayne] Thanks for commets. If an abusive user only abuse queue resource, we can use User Weights to limit such user. But if the user abuse other resource, like local disk, for example in our system, we found some users use large local disk, causing many NM unhealthy, in such case, we should forbid such user, insteading of just limitting. > Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue > --- > > Key: YARN-10384 > URL: https://issues.apache.org/jira/browse/YARN-10384 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Juanjuan Tian >Assignee: Juanjuan Tian >Priority: Major > Attachments: YARN-10384-001.patch > > > Currently CapacityScheduler supports acl_submit_applications, > acl_administer_queue to admister queue, but It may need to forbid some ones > in group of acl_submit_applications to submit applications to one specified > queue, since some ones may abuse the queue, and submit many applications, > meanwhile creating another groups just to exclude these ones costs effort and > time. For this scenario, we can just add another acl type - > FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue, forbid these > ones to submit application -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM
[ https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175148#comment-17175148 ] Yuanbo Liu commented on YARN-10393: --- Thanks for opening this issue, we happened to get similar situation on hadoop-2.7.0. The mapper lost heartbeat and nerver finish. Currently we just use "mapred fail-task" to make those mappers in failure state, and re-execute those mappers again. Looking forward to your patch! > MR job live lock caused by completed state container leak in heartbeat > between node manager and RM > -- > > Key: YARN-10393 > URL: https://issues.apache.org/jira/browse/YARN-10393 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Reporter: zhenzhao wang >Assignee: zhenzhao wang >Priority: Major > > This was a bug we had seen multiple times on Hadoop 2.6.2. And the following > analysis is based on the core dump, logs, and code in 2017 with Hadoop 2.6.2. > We hadn't seen it after 2.9 in our env. However, it was because of the RPC > retry policy change and other changes. There's still a possibility even with > the current code if I didn't miss anything. > *High-level description:* > We had seen a starving mapper issue several times. The MR job stuck in a > live lock state and couldn't make any progress. The queue is full so the > pending mapper can’t get any resource to continue, and the application master > failed to preempt the reducer, thus causing the job to be stuck. The reason > why the application master didn’t preempt the reducer was that there was a > leaked container in assigned mappers. The node manager failed to report the > completed container to the resource manager. > *Detailed steps:* > > # Container_1501226097332_249991_01_000199 was assigned to > attempt_1501226097332_249991_m_95_0 on 2017-08-08 16:00:00,417. > {code:java} > appmaster.log:6464:2017-08-08 16:00:00,417 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned > container container_1501226097332_249991_01_000199 to > attempt_1501226097332_249991_m_95_0 > {code} > # The container finished on 2017-08-08 16:02:53,313. > {code:java} > yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1501226097332_249991_01_000199 transitioned from RUNNING > to EXITED_WITH_SUCCESS > yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Cleaning up container container_1501226097332_249991_01_000199 > {code} > # The NodeStatusUpdater go an exception in the heartbeat on 2017-08-08 > 16:07:04,238. In fact, the heartbeat request is actually handled by resource > manager, however, the node manager failed to receive the response. Let’s > assume the heartBeatResponseId=$hid in node manager. According to our current > configuration, next heartbeat will be 10s later. > {code:java} > 2017-08-08 16:07:04,238 ERROR > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught > exception in status-updater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: ; destination host > is: XXX > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy33.nodeHeartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80) > at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy34.nodeHeartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:597) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(Sock
[jira] [Commented] (YARN-8459) Improve Capacity Scheduler logs to debug invalid states
[ https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175064#comment-17175064 ] Jim Brennan commented on YARN-8459: --- Thanks [~epayne]! > Improve Capacity Scheduler logs to debug invalid states > --- > > Key: YARN-8459 > URL: https://issues.apache.org/jira/browse/YARN-8459 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Fix For: 3.2.0, 3.1.1, 2.10.1 > > Attachments: YARN-8459-branch-2.10.001.patch, YARN-8459.001.patch, > YARN-8459.002.patch, YARN-8459.003.patch, YARN-8459.004.patch > > > Improve logs in CS to better debug invalid states -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8459) Improve Capacity Scheduler logs to debug invalid states
[ https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175047#comment-17175047 ] Eric Payne commented on YARN-8459: -- Thanks [~Jim_Brennan] for the updated patch. Yes, I think we should pull this back to 2.10. I will do so this afternoon. > Improve Capacity Scheduler logs to debug invalid states > --- > > Key: YARN-8459 > URL: https://issues.apache.org/jira/browse/YARN-8459 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8459-branch-2.10.001.patch, YARN-8459.001.patch, > YARN-8459.002.patch, YARN-8459.003.patch, YARN-8459.004.patch > > > Improve logs in CS to better debug invalid states -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10251) Show extended resources on legacy RM UI.
[ https://issues.apache.org/jira/browse/YARN-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175022#comment-17175022 ] Eric Payne commented on YARN-10251: --- Thank you very much, [~jhung] and [~Jim_Brennan]! > Show extended resources on legacy RM UI. > > > Key: YARN-10251 > URL: https://issues.apache.org/jira/browse/YARN-10251 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Fix For: 3.2.2, 2.10.1, 3.4.0, 3.3.1, 3.1.5 > > Attachments: Legacy RM UI With Not All Resources Shown.png, Updated > NodesPage UI With GPU columns.png, Updated RM UI With All Resources > Shown.png.png, YARN-10251.003.patch, YARN-10251.004.patch, > YARN-10251.005.patch, YARN-10251.006.patch, YARN-10251.007.patch, > YARN-10251.branch-2.10.001.patch, YARN-10251.branch-2.10.002.patch, > YARN-10251.branch-2.10.003.patch, YARN-10251.branch-2.10.005.patch, > YARN-10251.branch-2.10.006.patch, YARN-10251.branch-2.10.007.patch, > YARN-10251.branch-3.2.004.patch, YARN-10251.branch-3.2.005.patch, > YARN-10251.branch-3.2.006.patch, YARN-10251.branch-3.2.007.patch > > > It would be great to update the legacy RM UI to include GPU resources in the > overview and in the per-app sections. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10389) Option to override RMWebServices with custom WebService class
[ https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174941#comment-17174941 ] Prabhu Joseph edited comment on YARN-10389 at 8/10/20, 5:32 PM: [~tanu.ajmera] Thanks for the patch. The patch looks good. One minor issue which is not related to this patch. 1. The below null check is not required {code} bindExternalClasses(); if (rm != null) {code} rm gets accessed even before the null check from bindExternalClasses, so no use having the null check. {code} private void bindExternalClasses() { YarnConfiguration yarnConf = new YarnConfiguration(rm.getConfig()); {code} was (Author: prabhu joseph): [~tanu.ajmera] Thanks for the patch. The patch looks good. One minor issue which is not related to this patch. 1. The below null check is not required {code} bindExternalClasses(); if (rm != null) {code} rm gets accessed even before the null check from bindExternalClasses, so no use having the null check. {code} private void bindExternalClasses() { YarnConfiguration yarnConf = new YarnConfiguration(rm.getConfig()); {codE} > Option to override RMWebServices with custom WebService class > - > > Key: YARN-10389 > URL: https://issues.apache.org/jira/browse/YARN-10389 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: Prabhu Joseph >Assignee: Tanu Ajmera >Priority: Major > Attachments: YARN-10389-001.patch, YARN-10389-002.patch, > YARN-10389-003.patch, YARN-10389-004.patch, YARN-10389-005.patch, > YARN-10389-006.patch, YARN-10389-007.patch > > > YARN-8047 provides support to add custom WebServices as part of RMWebApp. > Since each WebService has to have a separate WebService Path, /ws/v1/cluster > root path cannot be used globally. > Another alternative is to provide an option to override the RMWebServices > with custom WebServices implementation which can extend the RMWebService, > this way /ws/v1/cluster path can be used globally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10389) Option to override RMWebServices with custom WebService class
[ https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174941#comment-17174941 ] Prabhu Joseph commented on YARN-10389: -- [~tanu.ajmera] Thanks for the patch. The patch looks good. One minor issue which is not related to this patch. 1. The below null check is not required {code} bindExternalClasses(); if (rm != null) {code} rm gets accessed even before the null check from bindExternalClasses, so no use having the null check. {code} private void bindExternalClasses() { YarnConfiguration yarnConf = new YarnConfiguration(rm.getConfig()); {codE} > Option to override RMWebServices with custom WebService class > - > > Key: YARN-10389 > URL: https://issues.apache.org/jira/browse/YARN-10389 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: Prabhu Joseph >Assignee: Tanu Ajmera >Priority: Major > Attachments: YARN-10389-001.patch, YARN-10389-002.patch, > YARN-10389-003.patch, YARN-10389-004.patch, YARN-10389-005.patch, > YARN-10389-006.patch, YARN-10389-007.patch > > > YARN-8047 provides support to add custom WebServices as part of RMWebApp. > Since each WebService has to have a separate WebService Path, /ws/v1/cluster > root path cannot be used globally. > Another alternative is to provide an option to override the RMWebServices > with custom WebServices implementation which can extend the RMWebService, > this way /ws/v1/cluster path can be used globally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
[ https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174430#comment-17174430 ] Eric Payne commented on YARN-10384: --- Also, FYI, we don't enter anything in the {{Fix Version}} field until the JIRA is resolved. > Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue > --- > > Key: YARN-10384 > URL: https://issues.apache.org/jira/browse/YARN-10384 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Juanjuan Tian >Assignee: Juanjuan Tian >Priority: Major > Attachments: YARN-10384-001.patch > > > Currently CapacityScheduler supports acl_submit_applications, > acl_administer_queue to admister queue, but It may need to forbid some ones > in group of acl_submit_applications to submit applications to one specified > queue, since some ones may abuse the queue, and submit many applications, > meanwhile creating another groups just to exclude these ones costs effort and > time. For this scenario, we can just add another acl type - > FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue, forbid these > ones to submit application -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
[ https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10384: -- Fix Version/s: (was: 3.2.0) [~jutia], Thank you for suggesting this improvement. While this may be a reasonable improvement, you may want to consider using the User Weights feature to limit abusive users. The configuration property that controls a user's weight is this: {{yarn.scheduler.capacity..user-settings..weight}}. By default, all users are given equal opportunity to receive queue resources. That is, their user weight is 1.0. However, an abusive user could be limited by making their user weight 0.1 (or even smaller). That way, they would only be able to take up a small fraction of what the other users can utilize, and they would always be considered last when assigning resources. > Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue > --- > > Key: YARN-10384 > URL: https://issues.apache.org/jira/browse/YARN-10384 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Juanjuan Tian >Assignee: Juanjuan Tian >Priority: Major > Attachments: YARN-10384-001.patch > > > Currently CapacityScheduler supports acl_submit_applications, > acl_administer_queue to admister queue, but It may need to forbid some ones > in group of acl_submit_applications to submit applications to one specified > queue, since some ones may abuse the queue, and submit many applications, > meanwhile creating another groups just to exclude these ones costs effort and > time. For this scenario, we can just add another acl type - > FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue, forbid these > ones to submit application -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10389) Option to override RMWebServices with custom WebService class
[ https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174358#comment-17174358 ] Hadoop QA commented on YARN-10389: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 17s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 48s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 49s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 36s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 44s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 44s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 5s{color} | {color:green
[jira] [Commented] (YARN-10380) Import logic of multi-node allocation in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174287#comment-17174287 ] Yuanbo Liu commented on YARN-10380: --- [~wangda] Thanks for opening this issue, Not sure whether you're working on it. I'd glad to help on it. > Import logic of multi-node allocation in CapacityScheduler > -- > > Key: YARN-10380 > URL: https://issues.apache.org/jira/browse/YARN-10380 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Priority: Critical > > *1) Entry point:* > When we do multi-node allocation, we're using the same logic of async > scheduling: > {code:java} > // Allocate containers of node [start, end) > for (FiCaSchedulerNode node : nodes) { > if (current++ >= start) { > if (shouldSkipNodeSchedule(node, cs, printSkipedNodeLogging)) { > continue; > } > cs.allocateContainersToNode(node.getNodeID(), false); > } > } {code} > Is it the most effective way to do multi-node scheduling? Should we allocate > based on partitions? In above logic, if we have thousands of node in one > partition, we will repeatly access all nodes of the partition thousands of > times. > I would suggest looking at making entry-point for node-heartbeat, > async-scheduling (single node), and async-scheduling (multi-node) to be > different. > Node-heartbeat and async-scheduling (single node) can be still similar and > share most of the code. > async-scheduling (multi-node): should iterate partition first, using pseudo > code like: > {code:java} > for (partition : all partitions) { > allocateContainersOnMultiNodes(getCandidate(partition)) > } {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10382) Non-secure yarn access secure hdfs
[ https://issues.apache.org/jira/browse/YARN-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174253#comment-17174253 ] Steve Loughran commented on YARN-10382: --- Problem there is that the code wants to know who the YARN principal of the resource manager is so that it can send messages to HDFS saying "renew these delegation tokens". Your insecure YARN RM doesn't have a kerberos principal, so secure HDFS will not issue delegation tokens to it. You could somehow cheat the configs to name some kerberos principal (yourself?) as the RM principal -no idea what happens then. I would personally like YARN To collect tokens from services even when Kerberos is disabled, though not for your use case - I want to be able to collect tokens for the object stores. But I've avoiding going near the code as (a) I'm scared and (b) applications like Spark do their own checks against UserGroupInformation.isSecurityEnabled() which still wouldn't work > Non-secure yarn access secure hdfs > -- > > Key: YARN-10382 > URL: https://issues.apache.org/jira/browse/YARN-10382 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: bianqi >Priority: Minor > > In our production environment, yarn cannot enable kerberos due to yarn > environment problems, but our hdfs is to enable kerberos, and now we need > non-secure yarn to access secure hdfs. > It is known that yarn and hdfs are both safe after security is turned on. > I hope that after enabling hdfs security, you can use non-secure yarn to > access secure hdfs, or use secure yarn to access non-secure hdfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10389) Option to override RMWebServices with custom WebService class
[ https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174227#comment-17174227 ] Hadoop QA commented on YARN-10389: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 1s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 42s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 48s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 20m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 10s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 3s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 10s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 4s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s{color} | {color:green
[jira] [Updated] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images
[ https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-3159: - Description: Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has only 1 "/" in the path. {code:java} public static final String DOCKER_IMAGE_PATTERN = "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$"; {code} In our cluster, the image name have multi layers, such as "docker-registry:8080/cloud/hadoop-docker:2.6.0", which is workable when using "docker pull IMAGE_NAME", but can not pass the check of image name in saneDockerImage(). was: Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has only 1 "/" in the path. {code} public static final String DOCKER_IMAGE_PATTERN = "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$"; {code} In our cluster, the image name have multi layers, such as "docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is workable when using "docker pull IMAGE_NAME", but can not pass the check of image name in saneDockerImage(). > DOCKER_IMAGE_PATTERN should support multilayered path of docker images > -- > > Key: YARN-3159 > URL: https://issues.apache.org/jira/browse/YARN-3159 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Leitao Guo >Assignee: Leitao Guo >Priority: Major > Labels: BB2015-05-TBR > Attachments: YARN-3159.patch > > > Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match > docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has > only 1 "/" in the path. > {code:java} > public static final String DOCKER_IMAGE_PATTERN = > "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$"; > {code} > In our cluster, the image name have multi layers, such as > "docker-registry:8080/cloud/hadoop-docker:2.6.0", which is workable when > using "docker pull IMAGE_NAME", but can not pass the check of image name in > saneDockerImage(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174123#comment-17174123 ] Adam Antal commented on YARN-4783: -- Thanks for the patch [~gandras]. I am not entirely convinced that this approach resolves the original problem. Since the RM cancels the token, renewing that token would fail. Can you test this patch on a cluster using the steps above? > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-4783.001.patch, YARN-4783.002.patch, > YARN-4783.003.patch > > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org