[jira] [Commented] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue

2020-08-10 Thread Juanjuan Tian (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175181#comment-17175181
 ] 

Juanjuan Tian  commented on YARN-10384:
---

[~epayne] Thanks for commets. If an abusive user only abuse queue resource, we 
can use  User Weights to limit such user. But if  the user abuse other 
resource, like local disk, for example in our system, we found some users use 
large local disk, causing many NM unhealthy, in such case, we should forbid 
such user, insteading of just limitting. 

> Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
> ---
>
> Key: YARN-10384
> URL: https://issues.apache.org/jira/browse/YARN-10384
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: YARN-10384-001.patch
>
>
> Currently CapacityScheduler supports acl_submit_applications, 
> acl_administer_queue to admister queue,  but It may need to forbid some ones 
> in group of acl_submit_applications to submit applications to one specified 
> queue, since some ones may abuse the queue, and submit many applications, 
> meanwhile creating another groups just to exclude these ones costs effort and 
> time. For this scenario, we can just add another acl type - 
> FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue,  forbid these 
> ones to submit application 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-08-10 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175148#comment-17175148
 ] 

Yuanbo Liu commented on YARN-10393:
---

Thanks for opening this issue, we happened to get similar situation on 
hadoop-2.7.0. The mapper lost heartbeat and nerver finish. Currently we just 
use "mapred fail-task" to make those mappers in failure state, and re-execute 
those mappers again. Looking forward to your patch!

 

> MR job live lock caused by completed state container leak in heartbeat 
> between node manager and RM
> --
>
> Key: YARN-10393
> URL: https://issues.apache.org/jira/browse/YARN-10393
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> This was a bug we had seen multiple times on Hadoop 2.6.2. And the following 
> analysis is based on the core dump, logs, and code in 2017 with Hadoop 2.6.2. 
> We hadn't seen it after 2.9 in our env. However, it was because of the RPC 
> retry policy change and other changes. There's still a possibility even with 
> the current code if I didn't miss anything.
> *High-level description:*
>  We had seen a starving mapper issue several times. The MR job stuck in a 
> live lock state and couldn't make any progress. The queue is full so the 
> pending mapper can’t get any resource to continue, and the application master 
> failed to preempt the reducer, thus causing the job to be stuck. The reason 
> why the application master didn’t preempt the reducer was that there was a 
> leaked container in assigned mappers. The node manager failed to report the 
> completed container to the resource manager.
> *Detailed steps:*
>  
>  # Container_1501226097332_249991_01_000199 was assigned to 
> attempt_1501226097332_249991_m_95_0 on 2017-08-08 16:00:00,417.
> {code:java}
> appmaster.log:6464:2017-08-08 16:00:00,417 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_1501226097332_249991_01_000199 to 
> attempt_1501226097332_249991_m_95_0
> {code}
>  # The container finished on 2017-08-08 16:02:53,313.
> {code:java}
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1501226097332_249991_01_000199 transitioned from RUNNING 
> to EXITED_WITH_SUCCESS
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1501226097332_249991_01_000199
> {code}
>  # The NodeStatusUpdater go an exception in the heartbeat on 2017-08-08 
> 16:07:04,238. In fact, the heartbeat request is actually handled by resource 
> manager, however, the node manager failed to receive the response. Let’s 
> assume the heartBeatResponseId=$hid in node manager. According to our current 
> configuration, next heartbeat will be 10s later.
> {code:java}
> 2017-08-08 16:07:04,238 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught 
> exception in status-updater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: ; destination host 
> is: XXX
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy33.nodeHeartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
> at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy34.nodeHeartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:597)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at 

[jira] [Commented] (YARN-8459) Improve Capacity Scheduler logs to debug invalid states

2020-08-10 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175064#comment-17175064
 ] 

Jim Brennan commented on YARN-8459:
---

Thanks [~epayne]!

> Improve Capacity Scheduler logs to debug invalid states
> ---
>
> Key: YARN-8459
> URL: https://issues.apache.org/jira/browse/YARN-8459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 2.10.1
>
> Attachments: YARN-8459-branch-2.10.001.patch, YARN-8459.001.patch, 
> YARN-8459.002.patch, YARN-8459.003.patch, YARN-8459.004.patch
>
>
> Improve logs in CS to better debug invalid states



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8459) Improve Capacity Scheduler logs to debug invalid states

2020-08-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175047#comment-17175047
 ] 

Eric Payne commented on YARN-8459:
--

Thanks [~Jim_Brennan] for the updated patch. Yes, I think we should pull this 
back to 2.10. I will do so this afternoon.

> Improve Capacity Scheduler logs to debug invalid states
> ---
>
> Key: YARN-8459
> URL: https://issues.apache.org/jira/browse/YARN-8459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8459-branch-2.10.001.patch, YARN-8459.001.patch, 
> YARN-8459.002.patch, YARN-8459.003.patch, YARN-8459.004.patch
>
>
> Improve logs in CS to better debug invalid states



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10251) Show extended resources on legacy RM UI.

2020-08-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175022#comment-17175022
 ] 

Eric Payne commented on YARN-10251:
---

Thank you very much, [~jhung] and [~Jim_Brennan]!

> Show extended resources on legacy RM UI.
> 
>
> Key: YARN-10251
> URL: https://issues.apache.org/jira/browse/YARN-10251
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.2.2, 2.10.1, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: Legacy RM UI With Not All Resources Shown.png, Updated 
> NodesPage UI With GPU columns.png, Updated RM UI With All Resources 
> Shown.png.png, YARN-10251.003.patch, YARN-10251.004.patch, 
> YARN-10251.005.patch, YARN-10251.006.patch, YARN-10251.007.patch, 
> YARN-10251.branch-2.10.001.patch, YARN-10251.branch-2.10.002.patch, 
> YARN-10251.branch-2.10.003.patch, YARN-10251.branch-2.10.005.patch, 
> YARN-10251.branch-2.10.006.patch, YARN-10251.branch-2.10.007.patch, 
> YARN-10251.branch-3.2.004.patch, YARN-10251.branch-3.2.005.patch, 
> YARN-10251.branch-3.2.006.patch, YARN-10251.branch-3.2.007.patch
>
>
> It would be great to update the legacy RM UI to include GPU resources in the 
> overview and in the per-app sections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10389) Option to override RMWebServices with custom WebService class

2020-08-10 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174941#comment-17174941
 ] 

Prabhu Joseph edited comment on YARN-10389 at 8/10/20, 5:32 PM:


[~tanu.ajmera] Thanks for the patch. The patch looks good. One minor issue 
which is not related to this patch.

1. The below null check is not required 

{code}
 bindExternalClasses();
 if (rm != null)
{code}

rm gets accessed even before the null check from bindExternalClasses, so no use 
having the null check.

{code}
  private void bindExternalClasses() {
YarnConfiguration yarnConf = new YarnConfiguration(rm.getConfig());
{code}




was (Author: prabhu joseph):
[~tanu.ajmera] Thanks for the patch. The patch looks good. One minor issue 
which is not related to this patch.

1. The below null check is not required 

{code}
 bindExternalClasses();
 if (rm != null)
{code}

rm gets accessed even before the null check from bindExternalClasses, so no use 
having the null check.

{code}
  private void bindExternalClasses() {
YarnConfiguration yarnConf = new YarnConfiguration(rm.getConfig());
{codE}



> Option to override RMWebServices with custom WebService class
> -
>
> Key: YARN-10389
> URL: https://issues.apache.org/jira/browse/YARN-10389
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10389-001.patch, YARN-10389-002.patch, 
> YARN-10389-003.patch, YARN-10389-004.patch, YARN-10389-005.patch, 
> YARN-10389-006.patch, YARN-10389-007.patch
>
>
> YARN-8047 provides support to add custom WebServices as part of RMWebApp.  
> Since each WebService has to have a separate WebService Path, /ws/v1/cluster 
> root path cannot be used globally.
> Another alternative is to provide an option to override the RMWebServices 
> with custom WebServices implementation which can extend the RMWebService, 
> this way /ws/v1/cluster path can be used globally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10389) Option to override RMWebServices with custom WebService class

2020-08-10 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174941#comment-17174941
 ] 

Prabhu Joseph commented on YARN-10389:
--

[~tanu.ajmera] Thanks for the patch. The patch looks good. One minor issue 
which is not related to this patch.

1. The below null check is not required 

{code}
 bindExternalClasses();
 if (rm != null)
{code}

rm gets accessed even before the null check from bindExternalClasses, so no use 
having the null check.

{code}
  private void bindExternalClasses() {
YarnConfiguration yarnConf = new YarnConfiguration(rm.getConfig());
{codE}



> Option to override RMWebServices with custom WebService class
> -
>
> Key: YARN-10389
> URL: https://issues.apache.org/jira/browse/YARN-10389
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10389-001.patch, YARN-10389-002.patch, 
> YARN-10389-003.patch, YARN-10389-004.patch, YARN-10389-005.patch, 
> YARN-10389-006.patch, YARN-10389-007.patch
>
>
> YARN-8047 provides support to add custom WebServices as part of RMWebApp.  
> Since each WebService has to have a separate WebService Path, /ws/v1/cluster 
> root path cannot be used globally.
> Another alternative is to provide an option to override the RMWebServices 
> with custom WebServices implementation which can extend the RMWebService, 
> this way /ws/v1/cluster path can be used globally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue

2020-08-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174430#comment-17174430
 ] 

Eric Payne commented on YARN-10384:
---

Also, FYI, we don't enter anything in the {{Fix Version}} field until the JIRA 
is resolved.

> Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
> ---
>
> Key: YARN-10384
> URL: https://issues.apache.org/jira/browse/YARN-10384
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: YARN-10384-001.patch
>
>
> Currently CapacityScheduler supports acl_submit_applications, 
> acl_administer_queue to admister queue,  but It may need to forbid some ones 
> in group of acl_submit_applications to submit applications to one specified 
> queue, since some ones may abuse the queue, and submit many applications, 
> meanwhile creating another groups just to exclude these ones costs effort and 
> time. For this scenario, we can just add another acl type - 
> FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue,  forbid these 
> ones to submit application 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue

2020-08-10 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10384:
--
Fix Version/s: (was: 3.2.0)

[~jutia], Thank you for suggesting this improvement.

While this may be a reasonable improvement, you may want to consider using the 
User Weights feature to limit abusive users.

The configuration property that controls a user's weight is this: 
{{yarn.scheduler.capacity..user-settings..weight}}. By 
default, all users are given equal opportunity to receive queue resources. That 
is, their user weight is 1.0. However, an abusive user could be limited by 
making their user weight 0.1 (or even smaller). That way, they would only be 
able to take up a small fraction of what the other users can utilize, and they 
would always be considered last when assigning resources.

> Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
> ---
>
> Key: YARN-10384
> URL: https://issues.apache.org/jira/browse/YARN-10384
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: YARN-10384-001.patch
>
>
> Currently CapacityScheduler supports acl_submit_applications, 
> acl_administer_queue to admister queue,  but It may need to forbid some ones 
> in group of acl_submit_applications to submit applications to one specified 
> queue, since some ones may abuse the queue, and submit many applications, 
> meanwhile creating another groups just to exclude these ones costs effort and 
> time. For this scenario, we can just add another acl type - 
> FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue,  forbid these 
> ones to submit application 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10389) Option to override RMWebServices with custom WebService class

2020-08-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174358#comment-17174358
 ] 

Hadoop QA commented on YARN-10389:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
17s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
48s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
49s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
44s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
44s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
5s{color} | {color:green} the patch 

[jira] [Commented] (YARN-10380) Import logic of multi-node allocation in CapacityScheduler

2020-08-10 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174287#comment-17174287
 ] 

Yuanbo Liu commented on YARN-10380:
---

[~wangda]

Thanks for opening this issue,

Not sure whether you're working on it. I'd glad to help on it.

> Import logic of multi-node allocation in CapacityScheduler
> --
>
> Key: YARN-10380
> URL: https://issues.apache.org/jira/browse/YARN-10380
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Priority: Critical
>
> *1) Entry point:* 
> When we do multi-node allocation, we're using the same logic of async 
> scheduling:
> {code:java}
> // Allocate containers of node [start, end)
>  for (FiCaSchedulerNode node : nodes) {
>   if (current++ >= start) {
>      if (shouldSkipNodeSchedule(node, cs, printSkipedNodeLogging)) {
>         continue;
>      }
>      cs.allocateContainersToNode(node.getNodeID(), false);
>   }
>  } {code}
> Is it the most effective way to do multi-node scheduling? Should we allocate 
> based on partitions? In above logic, if we have thousands of node in one 
> partition, we will repeatly access all nodes of the partition thousands of 
> times.
> I would suggest looking at making entry-point for node-heartbeat, 
> async-scheduling (single node), and async-scheduling (multi-node) to be 
> different.
> Node-heartbeat and async-scheduling (single node) can be still similar and 
> share most of the code. 
> async-scheduling (multi-node): should iterate partition first, using pseudo 
> code like: 
> {code:java}
> for (partition : all partitions) {
>   allocateContainersOnMultiNodes(getCandidate(partition))
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10382) Non-secure yarn access secure hdfs

2020-08-10 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174253#comment-17174253
 ] 

Steve Loughran commented on YARN-10382:
---

Problem there is that the code wants to know who the YARN principal of the 
resource manager is so that it can send messages to HDFS saying "renew these 
delegation tokens". Your insecure YARN RM doesn't have a kerberos principal, so 
secure HDFS will not issue delegation tokens to it. You could somehow cheat the 
configs to name some kerberos principal (yourself?) as the RM principal -no 
idea what happens then.

I would personally like YARN To collect tokens from services even when Kerberos 
is disabled, though not for your use case - I want to be able to collect tokens 
for the object stores. But I've avoiding going near the code as (a) I'm scared 
and (b) applications like Spark do their own checks against 
UserGroupInformation.isSecurityEnabled() which still wouldn't work

> Non-secure yarn access secure hdfs
> --
>
> Key: YARN-10382
> URL: https://issues.apache.org/jira/browse/YARN-10382
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: bianqi
>Priority: Minor
>
> In our production environment, yarn cannot enable kerberos due to yarn 
> environment problems, but our hdfs is to enable kerberos, and now we need 
> non-secure yarn to access secure hdfs.
> It is known that yarn and hdfs are both safe after security is turned on.
> I hope that after enabling hdfs security, you can use non-secure yarn to 
> access secure hdfs, or use secure yarn to access non-secure hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10389) Option to override RMWebServices with custom WebService class

2020-08-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174227#comment-17174227
 ] 

Hadoop QA commented on YARN-10389:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
1s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
42s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
48s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
10s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m  
3s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
10s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
10s{color} | {color:green} the patch 

[jira] [Updated] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images

2020-08-10 Thread Leitao Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-3159:
-
Description: 
Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
only 1 "/" in the path.
{code:java}
public static final String DOCKER_IMAGE_PATTERN = 
"^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
{code}
In our cluster, the image name have multi layers, such as 
"docker-registry:8080/cloud/hadoop-docker:2.6.0", which is workable when using 
"docker pull IMAGE_NAME", but can not pass the check of image name in 
saneDockerImage().

  was:
Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
only 1 "/" in the path.

{code}
public static final String DOCKER_IMAGE_PATTERN = 
"^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
{code}

In our cluster, the image name have multi layers, such as 
"docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is 
workable when using "docker pull IMAGE_NAME", but can not pass the check of 
image name in saneDockerImage().


> DOCKER_IMAGE_PATTERN should support multilayered path of docker images
> --
>
> Key: YARN-3159
> URL: https://issues.apache.org/jira/browse/YARN-3159
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Leitao Guo
>Assignee: Leitao Guo
>Priority: Major
>  Labels: BB2015-05-TBR
> Attachments: YARN-3159.patch
>
>
> Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
> docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
> only 1 "/" in the path.
> {code:java}
> public static final String DOCKER_IMAGE_PATTERN = 
> "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
> {code}
> In our cluster, the image name have multi layers, such as 
> "docker-registry:8080/cloud/hadoop-docker:2.6.0", which is workable when 
> using "docker pull IMAGE_NAME", but can not pass the check of image name in 
> saneDockerImage().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted

2020-08-10 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174123#comment-17174123
 ] 

Adam Antal commented on YARN-4783:
--

Thanks for the patch [~gandras].

I am not entirely convinced that this approach resolves the original problem. 
Since the RM cancels the token, renewing that token would fail. Can you test 
this patch on a cluster using the steps above?

> Log aggregation failure for application when Nodemanager is restarted 
> --
>
> Key: YARN-4783
> URL: https://issues.apache.org/jira/browse/YARN-4783
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-4783.001.patch, YARN-4783.002.patch, 
> YARN-4783.003.patch
>
>
> Scenario :
> =
> 1.Start NM with user dsperf:hadoop
> 2.Configure linux-execute user as dsperf
> 3.Submit application with yarn user 
> 4.Once few containers are allocated to NM 1
> 5.Nodemanager 1 is stopped  (wait for expiry )
> 6.Start node manager after application is completed
> 7.Check the log aggregation is happening for the containers log in NMLocal 
> directory
> Expect Output :
> ===
> Log aggregation should be succesfull
> Actual Output :
> ===
> Log aggreation not successfull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10389) Option to override RMWebServices with custom WebService class

2020-08-10 Thread Tanu Ajmera (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174106#comment-17174106
 ] 

Tanu Ajmera commented on YARN-10389:


Thanks [~sunilg]. 
1. Changes have been made.
2. Right now only ResourceManager refers RMWebApp that passes the conf object 
so NULL check is not required.

> Option to override RMWebServices with custom WebService class
> -
>
> Key: YARN-10389
> URL: https://issues.apache.org/jira/browse/YARN-10389
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10389-001.patch, YARN-10389-002.patch, 
> YARN-10389-003.patch, YARN-10389-004.patch, YARN-10389-005.patch, 
> YARN-10389-006.patch
>
>
> YARN-8047 provides support to add custom WebServices as part of RMWebApp.  
> Since each WebService has to have a separate WebService Path, /ws/v1/cluster 
> root path cannot be used globally.
> Another alternative is to provide an option to override the RMWebServices 
> with custom WebServices implementation which can extend the RMWebService, 
> this way /ws/v1/cluster path can be used globally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org