[jira] [Commented] (YARN-3610) FairScheduler: Add steady-fair-shares to the REST API documentation

2018-05-07 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465472#comment-16465472
 ] 

Wilfred Spiegelenburg commented on YARN-3610:
-

This looks OK, no more comments, +1

> FairScheduler: Add steady-fair-shares to the REST API documentation
> ---
>
> Key: YARN-3610
> URL: https://issues.apache.org/jira/browse/YARN-3610
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Ray Chiang
>Priority: Major
> Attachments: YARN-3610.001.patch, YARN-3610.002.patch, 
> YARN-3610.003.patch
>
>
> YARN-1050 adds documentation for FairScheduler REST API, but is missing the 
> steady-fair-share.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8025) UsersManangers#getComputedResourceLimitForActiveUsers throws NPE due to preComputedActiveUserLimit is empty

2018-05-07 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8025:
--
Fix Version/s: 3.0.3

> UsersManangers#getComputedResourceLimitForActiveUsers throws NPE due to 
> preComputedActiveUserLimit is empty
> ---
>
> Key: YARN-8025
> URL: https://issues.apache.org/jira/browse/YARN-8025
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Jiandan Yang 
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8025.001.patch, YARN-8025.002.patch
>
>
> UsersManangers#getComputedResourceLimitForActiveUsers throws NPE when I run 
> SLS.
>  *preComputedActiveUserLimit* is not put any element in the code.
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager.getComputedResourceLimitForActiveUsers(UsersManager.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getResourceLimitForActiveUsers(LeafQueue.java:1576)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.computeUserLimitAndSetHeadroom(LeafQueue.java:1517)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1190)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:824)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:630)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1802)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersOnMultiNodes(CapacityScheduler.java:1925)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1946)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.scheduleBasedOnNodeLabels(CapacityScheduler.java:732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:774)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8025) UsersManangers#getComputedResourceLimitForActiveUsers throws NPE due to preComputedActiveUserLimit is empty

2018-05-07 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465467#comment-16465467
 ] 

Weiwei Yang commented on YARN-8025:
---

I've committed this to trunk, cherry picked to branch-3.0 and branch-3.1. 
Thanks [~Tao Yang] for the contribution!

> UsersManangers#getComputedResourceLimitForActiveUsers throws NPE due to 
> preComputedActiveUserLimit is empty
> ---
>
> Key: YARN-8025
> URL: https://issues.apache.org/jira/browse/YARN-8025
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Jiandan Yang 
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8025.001.patch, YARN-8025.002.patch
>
>
> UsersManangers#getComputedResourceLimitForActiveUsers throws NPE when I run 
> SLS.
>  *preComputedActiveUserLimit* is not put any element in the code.
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager.getComputedResourceLimitForActiveUsers(UsersManager.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getResourceLimitForActiveUsers(LeafQueue.java:1576)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.computeUserLimitAndSetHeadroom(LeafQueue.java:1517)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1190)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:824)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:630)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1802)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersOnMultiNodes(CapacityScheduler.java:1925)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1946)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.scheduleBasedOnNodeLabels(CapacityScheduler.java:732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:774)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7003) DRAINING state of queues can't be recovered after RM restart

2018-05-07 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-7003:
---
Attachment: YARN-7003.002.patch

> DRAINING state of queues can't be recovered after RM restart
> 
>
> Key: YARN-7003
> URL: https://issues.apache.org/jira/browse/YARN-7003
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha4
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-7003.001.patch, YARN-7003.002.patch
>
>
> DRAINING state is a temporary state in RM memory, when queue state is set to 
> be STOPPED but there are still some pending or active apps in it, the queue 
> state will be changed to DRAINING instead of STOPPED after refreshing queues. 
> We've encountered the problem that the state of this queue will aways be 
> STOPPED after RM restarted, so that it can be removed at any time and leave 
> some apps in a non-existing queue.
> To fix this problem, we could recover DRAINING state in the recovery process 
> of pending/active apps. I will upload a patch with test case later for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6578) Return container resource utilization from NM ContainerStatus call

2018-05-07 Thread Yang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated YARN-6578:

Attachment: YARN-6578.003.patch

> Return container resource utilization from NM ContainerStatus call
> --
>
> Key: YARN-6578
> URL: https://issues.apache.org/jira/browse/YARN-6578
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Major
> Attachments: YARN-6578.001.patch, YARN-6578.002.patch, 
> YARN-6578.003.patch
>
>
> When the applicationMaster wants to change(increase/decrease) resources of an 
> allocated container, resource utilization is an important reference indicator 
> for decision making. So, when AM call NMClient.getContainerStatus, resource 
> utilization needs to be returned.
> Also container resource utilization need to report to RM to make better 
> scheduling.
> So put resource utilization in ContainerStatus.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7003) DRAINING state of queues can't be recovered after RM restart

2018-05-07 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465728#comment-16465728
 ] 

Tao Yang commented on YARN-7003:


Update v2 patch for trunk now. [~cheersyang], can you help to review this patch?

> DRAINING state of queues can't be recovered after RM restart
> 
>
> Key: YARN-7003
> URL: https://issues.apache.org/jira/browse/YARN-7003
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha4
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-7003.001.patch, YARN-7003.002.patch
>
>
> DRAINING state is a temporary state in RM memory, when queue state is set to 
> be STOPPED but there are still some pending or active apps in it, the queue 
> state will be changed to DRAINING instead of STOPPED after refreshing queues. 
> We've encountered the problem that the state of this queue will aways be 
> STOPPED after RM restarted, so that it can be removed at any time and leave 
> some apps in a non-existing queue.
> To fix this problem, we could recover DRAINING state in the recovery process 
> of pending/active apps. I will upload a patch with test case later for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6578) Return container resource utilization from NM ContainerStatus call

2018-05-07 Thread Yang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465620#comment-16465620
 ] 

Yang Wang commented on YARN-6578:
-

[~cheersyang], thanks for your comment.

Have fixed the findbugs issues.

The failed UT seem to be another issue, 
[YARN-8244|https://issues.apache.org/jira/browse/YARN-8244].

Do not need to fix checkstyle issues. Just as other metric variables, the 
vMemMBsStat and vMemMBQuantiles could be public in ContainerMetrics.java.

 

> Return container resource utilization from NM ContainerStatus call
> --
>
> Key: YARN-6578
> URL: https://issues.apache.org/jira/browse/YARN-6578
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Major
> Attachments: YARN-6578.001.patch, YARN-6578.002.patch
>
>
> When the applicationMaster wants to change(increase/decrease) resources of an 
> allocated container, resource utilization is an important reference indicator 
> for decision making. So, when AM call NMClient.getContainerStatus, resource 
> utilization needs to be returned.
> Also container resource utilization need to report to RM to make better 
> scheduling.
> So put resource utilization in ContainerStatus.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6578) Return container resource utilization from NM ContainerStatus call

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465757#comment-16465757
 ] 

genericqa commented on YARN-6578:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
2s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 21s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 183 unchanged - 0 fixed = 185 total (was 183) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
44s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
11s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
12s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
20s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}116m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-6578 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922239/YARN-6578.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux dc5398ce1bf2 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 

[jira] [Updated] (YARN-6645) Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor

2018-05-07 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated YARN-6645:

Fix Version/s: (was: 2.9.1)

> Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor
> ---
>
> Key: YARN-6645
> URL: https://issues.apache.org/jira/browse/YARN-6645
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Bingxue Qiu
>Priority: Major
> Attachments: error when creating symlink.png
>
>
> when creating symlink after the resource localized in our clusters , an 
> IOException has been thrown, because the nmPrivateDir doesn't exist. we add a 
> patch to fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6661) Too much CLEANUP event hang ApplicationMasterLauncher thread pool

2018-05-07 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated YARN-6661:

Fix Version/s: (was: 2.9.1)

> Too much CLEANUP event hang ApplicationMasterLauncher thread pool
> -
>
> Key: YARN-6661
> URL: https://issues.apache.org/jira/browse/YARN-6661
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
> Environment: hadoop 2.7.2 
>Reporter: JackZhou
>Priority: Major
>
> Some one else have already come up with the similar problem and fix it.
> We can look the jira(https://issues.apache.org/jira/browse/YARN-3809) for 
> detail.
> But I think the fix have not solve the problem completely, blow was the 
> problem I encountered:
> There is about 1000 nodes in my hadoop cluster, and I submit about 1800 apps.
> I failover my active rm and rm will failover all those 1800 apps.
> When a application failover, It will wait for AM container register itself. 
> But there is a bug in my AM (I do it intentionally), and it will not register 
> itself.
> So the RM will wait for about 10mins for the AM expiration, and it will send 
> a CLEANUP event to 
> ApplicationMasterLauncher thread pool. Because there is about 1800 apps, so 
> it will hang the ApplicationMasterLauncher
> thread pool for a large time. I have already use the 
> patch(https://issues.apache.org/jira/secure/attachment/12740804/YARN-3809.03.patch),
>  so
> a CLEANUP event will hang a thread 10 * 20 = 200s. But I have 1800 apps, so 
> for each of my thread, it will
> hang 1800 / 50 * 200s = 7200s=20min.
> Because the AM have register itself during 10mins, so it will retry and 
> create a new application attempt. 
> The application attempt will accept a container from RM, and send a LAUNCH to 
> ApplicationMasterLauncher thread pool.
> Because the 1800 CLEANUP will hang the 50 thread pools about 20mins. So the 
> application attempt will not 
> start the AM container during 10min. 
> And it will expire, and send a CLEANUP event to ApplicationMasterLauncher 
> thread pools too.
> As you can see, none of my application can really run it. 
> Each of them have 5 application attempts as follows, and each of them keep 
> retrying.
> appattempt_1495786030132_4000_05
> appattempt_1495786030132_4000_04
> appattempt_1495786030132_4000_03
> appattempt_1495786030132_4000_02  
> appattempt_1495786030132_4000_01
> So all of my apps have hang several hours, and none of them can really run. 
> I think this is a bug!!! We can treat CLEANUP and LAUNCH as different events.
> And use some other thread to deal with LAUNCH event or use other way.
> Sorry, I english is so poor. I don't know have I describe it clearly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7003) DRAINING state of queues can't be recovered after RM restart

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465833#comment-16465833
 ] 

genericqa commented on YARN-7003:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 123 unchanged - 0 fixed = 125 total (was 123) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 
21s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}121m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-7003 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922252/YARN-7003.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0bed421a038d 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 67f239c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20612/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20612/testReport/ |
| Max. process+thread count | 863 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Updated] (YARN-6606) The implementation of LocalizationStatus in ContainerStatusProto

2018-05-07 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated YARN-6606:

Fix Version/s: (was: 2.9.1)

> The implementation of LocalizationStatus in ContainerStatusProto
> 
>
> Key: YARN-6606
> URL: https://issues.apache.org/jira/browse/YARN-6606
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Bingxue Qiu
>Priority: Major
> Attachments: YARN-6606.1.patch, YARN-6606.2.patch
>
>
> we have a use case, where the full implementation of localization status in 
> ContainerStatusProto 
> [Continuous-resource-localization|https://issues.apache.org/jira/secure/attachment/12825041/Continuous-resource-localization.pdf]
>need to be done , so we make it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8252) Fix ServiceMaster main not found

2018-05-07 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created YARN-8252:
--

 Summary: Fix ServiceMaster main not found
 Key: YARN-8252
 URL: https://issues.apache.org/jira/browse/YARN-8252
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.0
Reporter: Zoltan Haindrich


I was looking into using yarn services; however it seems for some reason it is 
not possible to run {{ServiceMaster}} class from the jar...I might be missing 
some fundamental...so I've put together a shellscript to make it easy for 
anyone to checkI would be happy with any exception beyond main not found

[ServiceMaster.main 
method|https://github.com/apache/hadoop/blob/67f239c42f676237290d18ddbbc9aec369267692/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceMaster.java#L305]
{code:java}
#!/bin/bash
set -e

wget -O core.jar  -nv 
http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-services-core/3.1.0/hadoop-yarn-services-core-3.1.0.jar

unzip -qn core.jar
cat > org/apache/hadoop/yarn/service/ServiceMaster2.java << EOF
package org.apache.hadoop.yarn.service;
public class ServiceMaster2 {
  public static void main(String[] args) throws Exception {
System.out.println("asd!");
  }
}
EOF

javac org/apache/hadoop/yarn/service/ServiceMaster2.java

jar -cf a1.jar org

find org -name ServiceMaster*
# this will print "asd!"
java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster2
#the following invocations result in:
# Error: Could not find or load main class 
org.apache.hadoop.yarn.service.ServiceMaster
#
set +e
java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster
java -cp core.jar org.apache.hadoop.yarn.service.ServiceMaster
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8252) Fix ServiceMaster main not found

2018-05-07 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466001#comment-16466001
 ] 

Zoltan Haindrich commented on YARN-8252:


the problem is that {{ServiceMaster}} extends {{CompositeService}} ; which is 
in {{hadoop-common.jar}} - hence the class containing the main method could not 
be loaded...it's unfortunate that the jvm is unable to provide more information 
about the problem

I think it would be better to separate the main method to be in a separate 
class file.

And this brings me to the originial problem: because I've run into this... it 
seems "hadoop-common" is not on the classpath when ServiceMaster is starting up 
(invoked using api to bring up llap service)

> Fix ServiceMaster main not found
> 
>
> Key: YARN-8252
> URL: https://issues.apache.org/jira/browse/YARN-8252
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Zoltan Haindrich
>Priority: Major
>
> I was looking into using yarn services; however it seems for some reason it 
> is not possible to run {{ServiceMaster}} class from the jar...I might be 
> missing some fundamental...so I've put together a shellscript to make it easy 
> for anyone to checkI would be happy with any exception beyond main not 
> found
> [ServiceMaster.main 
> method|https://github.com/apache/hadoop/blob/67f239c42f676237290d18ddbbc9aec369267692/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceMaster.java#L305]
> {code:java}
> #!/bin/bash
> set -e
> wget -O core.jar  -nv 
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-services-core/3.1.0/hadoop-yarn-services-core-3.1.0.jar
> unzip -qn core.jar
> cat > org/apache/hadoop/yarn/service/ServiceMaster2.java << EOF
> package org.apache.hadoop.yarn.service;
> public class ServiceMaster2 {
>   public static void main(String[] args) throws Exception {
> System.out.println("asd!");
>   }
> }
> EOF
> javac org/apache/hadoop/yarn/service/ServiceMaster2.java
> jar -cf a1.jar org
> find org -name ServiceMaster*
> # this will print "asd!"
> java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster2
> #the following invocations result in:
> # Error: Could not find or load main class 
> org.apache.hadoop.yarn.service.ServiceMaster
> #
> set +e
> java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster
> java -cp core.jar org.apache.hadoop.yarn.service.ServiceMaster
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-07 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8207:

Attachment: YARN-8207.007.patch

> Docker container launch use popen have risk of shell expansion
> --
>
> Key: YARN-8207
> URL: https://issues.apache.org/jira/browse/YARN-8207
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8207.001.patch, YARN-8207.002.patch, 
> YARN-8207.003.patch, YARN-8207.004.patch, YARN-8207.005.patch, 
> YARN-8207.006.patch, YARN-8207.007.patch
>
>
> Container-executor code utilize a string buffer to construct docker run 
> command, and pass the string buffer to popen for execution.  Popen spawn a 
> shell to run the command.  Some arguments for docker run are still vulnerable 
> to shell expansion.  The possible solution is to convert from char * buffer 
> to string array for execv to avoid shell expansion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8252) Fix ServiceMaster main not found

2018-05-07 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466059#comment-16466059
 ] 

Zoltan Haindrich commented on YARN-8252:


currently the only jar provided at startup is a vanilla 
{{yarn-service-core.jar}}; I think the idea was to provide a fat at launch 
time... (or add all dependencies)
I don't seem to be able to see a suiteable fat jar in the build...but I'm not 
that familiar with hadoop build; so there might be one :D

> Fix ServiceMaster main not found
> 
>
> Key: YARN-8252
> URL: https://issues.apache.org/jira/browse/YARN-8252
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Zoltan Haindrich
>Priority: Major
>
> I was looking into using yarn services; however it seems for some reason it 
> is not possible to run {{ServiceMaster}} class from the jar...I might be 
> missing some fundamental...so I've put together a shellscript to make it easy 
> for anyone to checkI would be happy with any exception beyond main not 
> found
> [ServiceMaster.main 
> method|https://github.com/apache/hadoop/blob/67f239c42f676237290d18ddbbc9aec369267692/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceMaster.java#L305]
> {code:java}
> #!/bin/bash
> set -e
> wget -O core.jar  -nv 
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-services-core/3.1.0/hadoop-yarn-services-core-3.1.0.jar
> unzip -qn core.jar
> cat > org/apache/hadoop/yarn/service/ServiceMaster2.java << EOF
> package org.apache.hadoop.yarn.service;
> public class ServiceMaster2 {
>   public static void main(String[] args) throws Exception {
> System.out.println("asd!");
>   }
> }
> EOF
> javac org/apache/hadoop/yarn/service/ServiceMaster2.java
> jar -cf a1.jar org
> find org -name ServiceMaster*
> # this will print "asd!"
> java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster2
> #the following invocations result in:
> # Error: Could not find or load main class 
> org.apache.hadoop.yarn.service.ServiceMaster
> #
> set +e
> java -cp a1.jar org.apache.hadoop.yarn.service.ServiceMaster
> java -cp core.jar org.apache.hadoop.yarn.service.ServiceMaster
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466422#comment-16466422
 ] 

genericqa commented on YARN-8242:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 22s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 27 new + 147 unchanged - 2 fixed = 174 total (was 149) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 35s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery |
|   | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8242 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922212/YARN-8242.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0791651ed8bc 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5b11b9f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20614/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 

[jira] [Commented] (YARN-8255) Allow option to disable flex for a service component

2018-05-07 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466423#comment-16466423
 ] 

Billie Rinaldi commented on YARN-8255:
--

If people disagree with me and we do allow this to be configured, it should be 
through a configuration property read with YarnServiceConf.getBoolean rather 
than a new field. I'd prefer it to default to true (allowing flexing) for all 
types, but am flexible on its default for the NEVER type.

> Allow option to disable flex for a service component 
> -
>
> Key: YARN-8255
> URL: https://issues.apache.org/jira/browse/YARN-8255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>
> YARN-8080 implements restart capabilities for service component instances. 
> YARN service components should add an option to disallow flexing to support 
> workloads which are essentially batch/iterative jobs which terminate with 
> restart_policy=NEVER/ON_FAILURE. This could be disabled by default for 
> components where restart_policy=NEVER/ON_FAILURE and enabled by default when 
> restart_policy=ALWAYS(which is the default restart_policy) unless explicitly 
> set at the service spec.
> The option could be exposed as part of the component spec as "allow_flexing". 
> cc [~billie.rinaldi] [~gsaha] [~eyang] [~csingh] [~wangda]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7894) Improve ATS response for DS_CONTAINER when container launch fails

2018-05-07 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-7894:

Attachment: YARN-7894.004.patch

> Improve ATS response for DS_CONTAINER when container launch fails
> -
>
> Key: YARN-7894
> URL: https://issues.apache.org/jira/browse/YARN-7894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Charan Hebri
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-7894.001.patch, YARN-7894.002.patch, 
> YARN-7894.003.patch, YARN-7894.004.patch
>
>
> When a distributed shell application starts running and a container launch 
> fails the web service call to the API,
> {noformat}
> http:// address>/ws/v1/timeline/DS_CONTAINER/{noformat}
> return a "Not Found". The message returned in this case should be improved to 
> signify that a container launch failed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466589#comment-16466589
 ] 

genericqa commented on YARN-8233:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m  1s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8233 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12921116/YARN-8233.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a5e2acdac83d 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a3a1552 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20615/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20615/testReport/ |
| Max. process+thread count | 799 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-07 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466616#comment-16466616
 ] 

Eric Yang commented on YARN-8206:
-

[~ebadger] +1 for proposal 2.  This is safer option in my opinion.

> Sending a kill does not immediately kill docker containers
> --
>
> Key: YARN-8206
> URL: https://issues.apache.org/jira/browse/YARN-8206
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8206.001.patch, YARN-8206.002.patch, 
> YARN-8206.003.patch, YARN-8206.004.patch
>
>
> {noformat}
> if (ContainerExecutor.Signal.KILL.equals(signal)
> || ContainerExecutor.Signal.TERM.equals(signal)) {
>   handleContainerStop(containerId, env);
> {noformat}
> Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent 
> for docker containers. However, they should actually be separate. When YARN 
> sends a SIGKILL to a process, it means for it to die immediately and not sit 
> around waiting for anything. This ensures an immediate reclamation of 
> resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task 
> might not handle the signal correctly, and will then end up as a failed task 
> instead of a killed task. This is especially bad for preemption. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-07 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466529#comment-16466529
 ] 

Eric Badger commented on YARN-8206:
---

[~eyang], [~shaneku...@gmail.com], [~jlowe], [~Jim_Brennan], I've come up with 
2 different ways to solve the privileged container issue and I'd like your 
input on which route to go (though I have a slight preference). In both 
proposals, non-privileged containers will be signaled using {{kill}} instead of 
{{docker kill}}

Proposal 1:
The container-executor will not give up being root when it goes to signal the 
container, and will thus signal the container as root. This would only be for 
privileged containers, but is something that is not currently possible (right 
now, signaling has to call {{set_user()}} and you cannot set "root" as the 
user. 

Proposal 2:
Use the docker API for privileged containers, just like the code does today. 
This way, we won't be killing arbitrary process as root, just wielding the 
docker daemon as root as we do today. The downside here is that we have to go 
through a docker API call, which is slower than just sending the signal 
straight to the process. 

My preference would be for Proposal 2 as I'm not super comfortable allowing the 
container-executor to kill arbitrary processes as root, if you were somehow 
able to compromise the NM user. Currently, you would only be able to kill 
arbitrary non-root processes, if you comprised the NM user.

> Sending a kill does not immediately kill docker containers
> --
>
> Key: YARN-8206
> URL: https://issues.apache.org/jira/browse/YARN-8206
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8206.001.patch, YARN-8206.002.patch, 
> YARN-8206.003.patch, YARN-8206.004.patch
>
>
> {noformat}
> if (ContainerExecutor.Signal.KILL.equals(signal)
> || ContainerExecutor.Signal.TERM.equals(signal)) {
>   handleContainerStop(containerId, env);
> {noformat}
> Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent 
> for docker containers. However, they should actually be separate. When YARN 
> sends a SIGKILL to a process, it means for it to die immediately and not sit 
> around waiting for anything. This ensures an immediate reclamation of 
> resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task 
> might not handle the signal correctly, and will then end up as a failed task 
> instead of a killed task. This is especially bad for preemption. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-07 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8207:

Priority: Blocker  (was: Major)

> Docker container launch use popen have risk of shell expansion
> --
>
> Key: YARN-8207
> URL: https://issues.apache.org/jira/browse/YARN-8207
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8207.001.patch, YARN-8207.002.patch, 
> YARN-8207.003.patch, YARN-8207.004.patch, YARN-8207.005.patch, 
> YARN-8207.006.patch, YARN-8207.007.patch
>
>
> Container-executor code utilize a string buffer to construct docker run 
> command, and pass the string buffer to popen for execution.  Popen spawn a 
> shell to run the command.  Some arguments for docker run are still vulnerable 
> to shell expansion.  The possible solution is to convert from char * buffer 
> to string array for execv to avoid shell expansion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-07 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466578#comment-16466578
 ] 

Jason Lowe commented on YARN-8207:
--

Thanks for updating the patch!

Nit: Needing to initialize the args struct manually with '\{ 0 \}' each time is 
fragile to maintain (e.g.: it could easily break as soon as someone adds a new 
field).  It would be good to add an init_args function to encapsulate the 
initialization code or at least hide the init value in a macro.

There's an off-by-one error and heap corruption when construct_docker_command 
places the terminating NULL.  It allocates an array with buffer.length+1 
elements then proceeds to write to index buffer.length+1 which is after the end 
of the allocated array.

Rather than make an expensive deep copy of the arguments, 
construct_docker_command only needs to copy the args vector then set the number 
of arguments to zero.  At that point we'd be effectively transferring ownership 
of the already allocated arg strings to the caller without requiring full 
copies.

Speaking of copying, given the args struct is always intended to be passed to 
some flavor of execv, I think it would be easier to wield if the args struct 
heap-allocated its argument vector and tracked the terminating NULL directly.  
That way users don't have to perform a cumbersome and expensive copy of the 
arguments or remember to terminate it before calling execv.  It would also 
preclude the need for free_char_arr since the existing free_values could be 
used instead.  I'm not sure the array in the structure is buying us much.

Speaking of free_char_arr, most uses should use free_values since the calls are 
immediately followed by freeing the array pointer which free_values does.

launch_container_as_user does not free docker_command.

Nit: launch_container_as_user calls flatten but it's only necessary in the case 
where the fork fails.

flatten adds 1 to the strlen length in the loop, but there is only a need for 
one NUL terminator which is already accounted for in the {{total}} initial 
value.

flatten is using stpcpy incorrectly as it ignores the return values from the 
function.  stpcpy returns a pointer to the terminating NUL of the resulting 
string which is exactly what we need for appending, so each invocation of 
stpcpy should be like: {{to = stpcpy(to, ...)}}

flatten terminates the string by writing to buffer[total] but that is past the 
end of the allocated array since it is only {{total} bytes in size.  It should 
be simply
{code}
  *to = '\0';
{code}

reset_args and free_args are a NULL-check away from being the same thing, and 
arguably free_args should check for NULL if reset_args does.  That indicates we 
only need one of these.

Nit: The args length != 0 check in free_args is unnecessary as the {{for}} loop 
will essentially do the same.

check_trusted_image should be calling free_values instead of free_char_arr() 
and free() on the get_configuration_values_delimiter result.

add_param_to_command_if_allowed (and many other places) doesn't check for 
make_string failure, and add_to_args will segfault when it tries to dereference 
the NULL argument.  Does it make sense to have add_to_args return failure if 
the caller tried to add a NULL argument?

This change doesn't look related to the execv changes?  Also looks like a case 
that could be simplified quite a bit with strndup and strdup.
{noformat}
@@ -195,11 +218,11 @@ static int add_param_to_command_if_allowed(const struct 
configuration *command_c
   } else {
 // If permitted-Values[j] is a REGEX, use REGEX to compare
 if (is_regex(permitted_values[j])) {
-  size_t offset = tmp_ptr - values[i];
+  size_t offset = tmp_ptr - values[i] + 1;
   dst = (char *) alloc_and_clear_memory(offset, sizeof(char));
   strncpy(dst, values[i], offset);
   dst[tmp_ptr - values[i]] = '\0';
-  pattern = (char *) 
alloc_and_clear_memory((size_t)(strlen(permitted_values[j]) - 6), sizeof(char));
+  pattern = (char *) 
alloc_and_clear_memory((size_t)(strlen(permitted_values[j]) - 5), sizeof(char));
   strcpy(pattern, permitted_values[j] + 6);
   ret = execute_regex_match(pattern, dst);
 } else {
{noformat}

set_pid_namespace and set_privileged allocate an unused 1024-byte array.

Nit: The last {{goto free_and_exit}} in get_docker_kill_command is unnecessary.

In get_docker_start_command:
{code}
  ret = add_to_args(args, container_name);
  if (ret != 0) {
goto free_and_exit;
  }
free_and_exit:
{code}
should be simplified to:
{code}
  ret = add_to_args(args, container_name);
free_and_exit:
{code}

set_group_add should be calling free_values instad of free_char_arr or the 
pointer array is leaked.

get_docker_stop_command does not check for add_to_args failure after trying to 
add the 

[jira] [Commented] (YARN-7892) Revisit NodeAttribute class structure

2018-05-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466618#comment-16466618
 ] 

Wangda Tan commented on YARN-7892:
--

Thanks [~Naganarasimha], For id(identifier) and key, I think they're 
interchangeable in many scenarios such as entity.id / entity.key. 

However, for the map-like data (1 => 1 mapping), for example map / 
environment-variable, etc. The it should be named as "key" instead of "id", you 
can check 
{{org.apache.hadoop.yarn.api.resource.PlacementConstraint.TargetExpression}} as 
an example.

> Revisit NodeAttribute class structure
> -
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch, 
> YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, 
> YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, 
> YARN-7892-YARN-3409.005.patch, YARN-7892-YARN-3409.006.patch
>
>
> In the existing structure, we had kept the type and value along with the 
> attribute which would create confusion to the user to understand the APIs as 
> they would not be clear as to what needs to be sent for type and value while 
> fetching the mappings for node(s).
> As well as equals will not make sense when we compare only for prefix and 
> name where as values for them might be different.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-05-07 Thread Kanwaljeet Sachdev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanwaljeet Sachdev updated YARN-8242:
-
Attachment: YARN-8242.003.patch

> YARN NM: OOM error while reading back the state store on recovery
> -
>
> Key: YARN-8242
> URL: https://issues.apache.org/jira/browse/YARN-8242
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Kanwaljeet Sachdev
>Assignee: Kanwaljeet Sachdev
>Priority: Blocker
> Attachments: YARN-8242.001.patch, YARN-8242.002.patch, 
> YARN-8242.003.patch
>
>
> On startup the NM reads its state store and builds a list of application in 
> the state store to process. If the number of applications in the state store 
> is large and have a lot of "state" connected to it the NM can run OOM and 
> never get to the point that it can start processing the recovery.
> Since it never starts the recovery there is no way for the NM to ever pass 
> this point. It will require a change in heap size to get the NM started.
>  
> Following is the stack trace
> {code:java}
> at java.lang.OutOfMemoryError. (OutOfMemoryError.java:48) at 
> com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at 
> com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47069) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47014) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47102) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:41016) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:40942) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41080) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24517) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24464) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24568) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24563) at 
> com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) 
> at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom
>  (YarnServiceProtos.java:24739) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState
>  (NMLeveldbStateStoreService.java:217) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState
>  (NMLeveldbStateStoreService.java:170) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover
>  (ContainerManagerImpl.java:253) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit
>  (ContainerManagerImpl.java:237) at 
> org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at 
> org.apache.hadoop.service.CompositeService.serviceInit 
> (CompositeService.java:107) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit 
> (NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init 
> (AbstractService.java:163) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager 
> (NodeManager.java:474) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main 
> (NodeManager.java:521){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466613#comment-16466613
 ] 

genericqa commented on YARN-8191:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
35s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
34s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 34s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 88 unchanged - 0 fixed = 89 total (was 88) {color} 
|
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
33s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 
17s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 35s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 43m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8191 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922283/YARN-8191.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4d7eae3404e6 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 
21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 696a4be |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/20621/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| compile | 

[jira] [Commented] (YARN-8255) Allow option to disable flex for a service component

2018-05-07 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466448#comment-16466448
 ] 

Eric Yang commented on YARN-8255:
-

Instead of introduce another field to enable or disable flex.  We can identify 
if the workload can perform flex operation base on restart_policy.

When restart_policy=ON_FAILURE or ALWAYS, this means the data can be 
recomputed, or the process can resume from failure.  Flex operation can be 
enabled.

When restart_policy=NEVER, this means the data is stateful, and can not 
reprocess.  (i.e. mapreduce writes to HBase without transaction property.) . 
This type of containers are not allowed to have flexing operation.

By reasoning deduction, it is possible to reduce combinations that will be 
supported.  This also implies that restart_policy=NEVER doesn't have to support 
upgrade.

> Allow option to disable flex for a service component 
> -
>
> Key: YARN-8255
> URL: https://issues.apache.org/jira/browse/YARN-8255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>
> YARN-8080 implements restart capabilities for service component instances. 
> YARN service components should add an option to disallow flexing to support 
> workloads which are essentially batch/iterative jobs which terminate with 
> restart_policy=NEVER/ON_FAILURE. This could be disabled by default for 
> components where restart_policy=NEVER/ON_FAILURE and enabled by default when 
> restart_policy=ALWAYS(which is the default restart_policy) unless explicitly 
> set at the service spec.
> The option could be exposed as part of the component spec as "allow_flexing". 
> cc [~billie.rinaldi] [~gsaha] [~eyang] [~csingh] [~wangda]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8256) Pluggable provider for node membership management

2018-05-07 Thread Dagang Wei (JIRA)
Dagang Wei created YARN-8256:


 Summary: Pluggable provider for node membership management
 Key: YARN-8256
 URL: https://issues.apache.org/jira/browse/YARN-8256
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 3.0.2, 2.8.3
Reporter: Dagang Wei


h1. Background

[HDFS-7541|https://issues.apache.org/jira/browse/HDFS-7541] introduced a 
pluggable provider framework for node membership management, which gives HDFS 
the flexibility to have different ways to manage node membership for different 
needs.

[org.apache.hadoop.hdfs.server.blockmanagement.HostConfigManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java]
 is the class which provides the abstraction. Currently, there are 2 
implementations in the HDFS codebase:

1) 
[org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java]
 which uses 2 config files which are defined by the properties dfs.hosts and 
dfs.hosts.exclude.

2) 
[org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java]
 which uses a single JSON file defined by the property dfs.hosts.

dfs.namenode.hosts.provider.classname is the property determining which 
implementation is used

h1. Problem

YARN should be consistent with HDFS in terms of pluggable provider for node 
membership management. The absence of it makes YARM impossible to have other 
config sources, e.g., ZooKeeper, database, etc.

h1. Proposed solution

[org.apache.hadoop.yarn.server.resourcemanager.NodesListManager|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java]
 is the class for managing YARN node membership today. It uses 
[HostsFileReader|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java]
 to read config files specified by the property 
yarn.resourcemanager.nodes.include-path for nodes to include and 
yarn.resourcemanager.nodes.nodes.exclude-path for nodes to exclude.

The proposed solution is to

1) introduce a new interface {color:green}HostsConfigManager{color} which 
provides the abstraction for node membership management. Update 
{color:green}NodeListManager{color} to depend on 
{color:green}HostsConfigManager{color} instead of 
{color:green}HostsFileReader{color}. Then create a wrapper class for 
{color:green}HostsFileReader{color} which implements the interface.

2) introduce a new config property 
{color:green}yarn.resourcemanager.hosts.provider.classname{color} for 
specifying the implementation class. Set the default value to the wrapper class 
of {color:green}HostsFileReader{color} for backward compatibility between new 
code and old config.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7894) Improve ATS response for DS_CONTAINER when container launch fails

2018-05-07 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466483#comment-16466483
 ] 

Billie Rinaldi commented on YARN-7894:
--

That output looks good, thanks [~csingh]. I noticed one more thing, which is 
that publishContainerStartFailedEvent doesn't check whether timeline service V2 
is enabled. It seems like this is checked for all the other publish methods.

> Improve ATS response for DS_CONTAINER when container launch fails
> -
>
> Key: YARN-7894
> URL: https://issues.apache.org/jira/browse/YARN-7894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Charan Hebri
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-7894.001.patch, YARN-7894.002.patch, 
> YARN-7894.003.patch
>
>
> When a distributed shell application starts running and a container launch 
> fails the web service call to the API,
> {noformat}
> http:// address>/ws/v1/timeline/DS_CONTAINER/{noformat}
> return a "Not Found". The message returned in this case should be improved to 
> signify that a container launch failed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-07 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466512#comment-16466512
 ] 

Eric Yang commented on YARN-8207:
-

[~jlowe] Hadoop 3.1.1 release date was proposed for May 7th.  This is a 
blocking issue for YARN-7654.  I think this JIRA is very close to completion, 
and I like to make sure that we can catch the release train.  Are you 
comfortable to the last iteration of this patch?

> Docker container launch use popen have risk of shell expansion
> --
>
> Key: YARN-8207
> URL: https://issues.apache.org/jira/browse/YARN-8207
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8207.001.patch, YARN-8207.002.patch, 
> YARN-8207.003.patch, YARN-8207.004.patch, YARN-8207.005.patch, 
> YARN-8207.006.patch, YARN-8207.007.patch
>
>
> Container-executor code utilize a string buffer to construct docker run 
> command, and pass the string buffer to popen for execution.  Popen spawn a 
> shell to run the command.  Some arguments for docker run are still vulnerable 
> to shell expansion.  The possible solution is to convert from char * buffer 
> to string array for execv to avoid shell expansion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7892) Revisit NodeAttribute class structure

2018-05-07 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466577#comment-16466577
 ] 

Naganarasimha G R commented on YARN-7892:
-

@ [~leftnoteasy],  Hope you could share your view on my above comment ?

 

> Revisit NodeAttribute class structure
> -
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch, 
> YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, 
> YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, 
> YARN-7892-YARN-3409.005.patch, YARN-7892-YARN-3409.006.patch
>
>
> In the existing structure, we had kept the type and value along with the 
> attribute which would create confusion to the user to understand the APIs as 
> they would not be clear as to what needs to be sent for type and value while 
> fetching the mappings for node(s).
> As well as equals will not make sense when we compare only for prefix and 
> name where as values for them might be different.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466576#comment-16466576
 ] 

Wangda Tan commented on YARN-8141:
--

Thanks [~shaneku...@gmail.com], I think we should consolidate the two, and the 
backward compatibility is not an issue because a. 3.1.0 is an unstable release, 
b. the variable itself is marked as {{@private}}

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups

2018-05-07 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.001.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-4599.000.patch, YARN-4599.001.patch, 
> YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8255) Allow option to disable flex for a service component

2018-05-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466462#comment-16466462
 ] 

Wangda Tan commented on YARN-8255:
--

Thanks [~suma.shivaprasad] for filing the JIRA and suggestions from [~eyang] / 
[~billie.rinaldi], 

I think the service flexing is different from restart policy: As mentioned by 
[~eyang], restart policy = on_failure / always means some part of the job can 
be *recomputed*. *Recomputable* is different from *Expandable*, an example is 
map-reduce, # of mappers and reducers are determined by InputFormat, which is 
determined before job get launched. Allocating more mappers or reducers than 
pre-calculated while job is running doesn't helpful. Many computation 
frameworks are in this pattern, such as Tensorflow/OpenMPI, etc. adding tasks 
while job is running isn't helpful.

Considering this, I would prefer what Suma suggested, allow user to specify 
allow_flexing, sometimes adding a new instance to a component could lead task 
or even master failure because it is unexpected. I tend to agree making 
allow_flexing=false by default, but I'm also fine with the opposite.

> Allow option to disable flex for a service component 
> -
>
> Key: YARN-8255
> URL: https://issues.apache.org/jira/browse/YARN-8255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>
> YARN-8080 implements restart capabilities for service component instances. 
> YARN service components should add an option to disallow flexing to support 
> workloads which are essentially batch/iterative jobs which terminate with 
> restart_policy=NEVER/ON_FAILURE. This could be disabled by default for 
> components where restart_policy=NEVER/ON_FAILURE and enabled by default when 
> restart_policy=ALWAYS(which is the default restart_policy) unless explicitly 
> set at the service spec.
> The option could be exposed as part of the component spec as "allow_flexing". 
> cc [~billie.rinaldi] [~gsaha] [~eyang] [~csingh] [~wangda]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-07 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466512#comment-16466512
 ] 

Eric Yang edited comment on YARN-8207 at 5/7/18 9:47 PM:
-

[~jlowe] Hadoop 3.1.1 release date was proposed for May 7th.  This is a 
blocking issue for YARN-7654.  I think this JIRA is very close to completion, 
and I like to make sure that we can catch the release train.  Are you 
comfortable with the latest iteration of this patch?


was (Author: eyang):
[~jlowe] Hadoop 3.1.1 release date was proposed for May 7th.  This is a 
blocking issue for YARN-7654.  I think this JIRA is very close to completion, 
and I like to make sure that we can catch the release train.  Are you 
comfortable to the last iteration of this patch?

> Docker container launch use popen have risk of shell expansion
> --
>
> Key: YARN-8207
> URL: https://issues.apache.org/jira/browse/YARN-8207
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8207.001.patch, YARN-8207.002.patch, 
> YARN-8207.003.patch, YARN-8207.004.patch, YARN-8207.005.patch, 
> YARN-8207.006.patch, YARN-8207.007.patch
>
>
> Container-executor code utilize a string buffer to construct docker run 
> command, and pass the string buffer to popen for execution.  Popen spawn a 
> shell to run the command.  Some arguments for docker run are still vulnerable 
> to shell expansion.  The possible solution is to convert from char * buffer 
> to string array for execv to avoid shell expansion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8256) Pluggable provider for node membership management

2018-05-07 Thread Dagang Wei (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dagang Wei updated YARN-8256:
-
Description: 
h1. Background

HDFS-7541 introduced a pluggable provider framework for node membership 
management, which gives HDFS the flexibility to have different ways to manage 
node membership for different needs.

[org.apache.hadoop.hdfs.server.blockmanagement.HostConfigManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java]
 is the class which provides the abstraction. Currently, there are 2 
implementations in the HDFS codebase:

1) 
[org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java]
 which uses 2 config files which are defined by the properties dfs.hosts and 
dfs.hosts.exclude.

2) 
[org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java]
 which uses a single JSON file defined by the property dfs.hosts.

dfs.namenode.hosts.provider.classname is the property determining which 
implementation is used
h1. Problem

YARN should be consistent with HDFS in terms of pluggable provider for node 
membership management. The absence of it makes YARN impossible to have other 
config sources, e.g., ZooKeeper, database, etc.
h1. Proposed solution

[org.apache.hadoop.yarn.server.resourcemanager.NodesListManager|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java]
 is the class for managing YARN node membership today. It uses 
[HostsFileReader|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java]
 to read config files specified by the property 
yarn.resourcemanager.nodes.include-path for nodes to include and 
yarn.resourcemanager.nodes.nodes.exclude-path for nodes to exclude.

The proposed solution is to

1) introduce a new interface {color:#008000}HostsConfigManager{color} which 
provides the abstraction for node membership management. Update 
{color:#008000}NodeListManager{color} to depend on 
{color:#008000}HostsConfigManager{color} instead of 
{color:#008000}HostsFileReader{color}. Then create a wrapper class for 
{color:#008000}HostsFileReader{color} which implements the interface.

2) introduce a new config property 
{color:#008000}yarn.resourcemanager.hosts.provider.classname{color} for 
specifying the implementation class. Set the default value to the wrapper class 
of {color:#008000}HostsFileReader{color} for backward compatibility between new 
code and old config.

  was:
h1. Background

[HDFS-7541|https://issues.apache.org/jira/browse/HDFS-7541] introduced a 
pluggable provider framework for node membership management, which gives HDFS 
the flexibility to have different ways to manage node membership for different 
needs.

[org.apache.hadoop.hdfs.server.blockmanagement.HostConfigManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java]
 is the class which provides the abstraction. Currently, there are 2 
implementations in the HDFS codebase:

1) 
[org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java]
 which uses 2 config files which are defined by the properties dfs.hosts and 
dfs.hosts.exclude.

2) 
[org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java]
 which uses a single JSON file defined by the property dfs.hosts.

dfs.namenode.hosts.provider.classname is the property determining which 
implementation is used

h1. Problem

YARN should be consistent with HDFS in terms of pluggable provider for node 
membership management. The absence of it makes YARM impossible to have other 
config sources, e.g., ZooKeeper, database, etc.

h1. Proposed solution

[org.apache.hadoop.yarn.server.resourcemanager.NodesListManager|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java]
 is the class for managing YARN node membership 

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466541#comment-16466541
 ] 

Eric Payne commented on YARN-4606:
--

{code:title=AppSchedulingInfo#updatePendingResources}
if(! hasActiveUsersOfPendingAppsDecremented.get()) {
abstractUsersManager.decrNumActiveUsersOfPendingApps();
hasActiveUsersOfPendingAppsDecremented.set(true);
}
{code}

Does {{hasActiveUsersOfPendingAppsDecremented}} need to be atomic? What is the 
benefit?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8254) dynamically change log levels for YARN Jobs

2018-05-07 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466553#comment-16466553
 ] 

Naganarasimha G R commented on YARN-8254:
-

Thanks [~Prabhu Joseph] for raising this issue, it would be a nice thing to 
have. But if you could give more info on role of YARN and the role of 
application(AM) in doing the above would be more helpful.

> dynamically change log levels for YARN Jobs
> ---
>
> Key: YARN-8254
> URL: https://issues.apache.org/jira/browse/YARN-8254
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Priority: Major
>  Labels: supportability
>
> Currently the Log Levels for Daemons can be dynamically changed. It will be 
> easier while debugging to have same for YARN Jobs. Client can setLogLevel to 
> ApplicationMaster which can set it for all the containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-07 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-8141:


Assignee: Chandni Singh

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466649#comment-16466649
 ] 

genericqa commented on YARN-8207:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 32m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
44m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 23m 28s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 83m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerBehaviorCompatibility
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8207 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922274/YARN-8207.007.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 11ca01f6e274 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 696a4be |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/20622/artifact/out/whitespace-tabs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20622/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20622/testReport/ |
| Max. process+thread count | 303 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20622/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Docker container launch use popen have risk of shell expansion
> --
>
> Key: YARN-8207
> URL: https://issues.apache.org/jira/browse/YARN-8207
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.0.0, 3.1.0, 

[jira] [Commented] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-07 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466683#comment-16466683
 ] 

Eric Yang commented on YARN-8207:
-

[~jlowe] 

{quote}Rather than make an expensive deep copy of the arguments, 
construct_docker_command only needs to copy the args vector then set the number 
of arguments to zero. At that point we'd be effectively transferring ownership 
of the already allocated arg strings to the caller without requiring full 
copies.{quote}

Struct args is still evolving.  I think it would be safer to keep the data 
structure private for opaque data structure and deep copy to caller.  This 
avoids to put responsibility on external caller to free internal implementation 
of struct args.  In case if we want to have ability to trim or truncate the 
string array base on allowed parameters.  We have a way to fix it later.

{quote}add_param_to_command_if_allowed (and many other places) doesn't check 
for make_string failure, and add_to_args will segfault when it tries to 
dereference the NULL argument. Does it make sense to have add_to_args return 
failure if the caller tried to add a NULL argument?{quote}

At this time, add_to_args returns no opts to avoid having to check for null on 
make_string.  I think the proposal of making the reverse change will add more 
null pointer check, which makes the code harder to read again.  It will 
contradict the original intend of your reviews to make code easier to read.

{quote}flatten adds 1 to the strlen length in the loop, but there is only a 
need for one NUL terminator which is already accounted for in the total initial 
value.{quote}

The +1 is for space, not NULL terminator for rendering html page that looks 
like a command line.  The last space is replaced with NULL terminator.

{quote}flatten is using stpcpy incorrectly as it ignores the return values from 
the function. stpcpy returns a pointer to the terminating NUL of the resulting 
string which is exactly what we need for appending, so each invocation of 
stpcpy should be like: to = stpcpy(to, ...){quote}

This is fixed in YARN-7654 patch.  It's hard to rebase n times, and stuff gets 
to the wrong patch.  I will fix this.

{quote}This change doesn't look related to the execv changes? Also looks like a 
case that could be simplified quite a bit with strndup and strdup.{quote}

There is one byte off memory corruption that pattern is not null terminated 
properly.  This was detected by valgrind, and I decided to fix this because it 
causes segfault if I leave it in the code.

I will fix the rest of issues that you found.  Thank you again for the review.

> Docker container launch use popen have risk of shell expansion
> --
>
> Key: YARN-8207
> URL: https://issues.apache.org/jira/browse/YARN-8207
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8207.001.patch, YARN-8207.002.patch, 
> YARN-8207.003.patch, YARN-8207.004.patch, YARN-8207.005.patch, 
> YARN-8207.006.patch, YARN-8207.007.patch
>
>
> Container-executor code utilize a string buffer to construct docker run 
> command, and pass the string buffer to popen for execution.  Popen spawn a 
> shell to run the command.  Some arguments for docker run are still vulnerable 
> to shell expansion.  The possible solution is to convert from char * buffer 
> to string array for execv to avoid shell expansion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8253) HTTPS Ats v2 api call fails with "bad HTTP parsed"

2018-05-07 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-8253:
---

Assignee: Rohith Sharma K S

> HTTPS Ats v2 api call fails with "bad HTTP parsed"
> --
>
> Key: YARN-8253
> URL: https://issues.apache.org/jira/browse/YARN-8253
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Rohith Sharma K S
>Priority: Major
>
> When Yarn http policy is set to Https_only, ATS v2 should use HTTPS address.
> Here, ATS v2 call is failing with below error.
> {code:java}
> [hrt_qa@xxx root]$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 
> 'Accept: application/json' --negotiate -u: 
> 'https://xxx:8199/ws/v2/timeline/apps/application_1525238789838_0003/entities/COMPONENT_INSTANCE?fields=ALL'
> [hrt_qa@xxx root]$ echo $?
> 35{code}
> {code:java|title=Ats v2}
> 2018-05-02 05:45:40,427 WARN  http.HttpParser (HttpParser.java:(1832)) 
> - Illegal character 0x16 in state=START for buffer 
> HeapByteBuffer@dba438[p=1,l=222,c=8192,r=221]={\x16<<<\x03\x01\x00\xD9\x01\x00\x00\xD5\x03\x03;X\xEd\xD1orq...\x01\x05\x01\x06\x01\x02\x01\x04\x02\x05\x02\x06\x02\x02\x02>>>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 2018-05-02 05:45:40,428 WARN  http.HttpParser 
> (HttpParser.java:parseNext(1435)) - bad HTTP parsed: 400 Illegal character 
> 0x16 for HttpChannelOverHttp@2efbda6c{r=0,c=false,a=IDLE,uri=null}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8247) Incorrect HTTP status code returned by ATSv2 for non-whitelisted users

2018-05-07 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-8247:
---

Assignee: Rohith Sharma K S

> Incorrect HTTP status code returned by ATSv2 for non-whitelisted users
> --
>
> Key: YARN-8247
> URL: https://issues.apache.org/jira/browse/YARN-8247
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Reporter: Charan Hebri
>Assignee: Rohith Sharma K S
>Priority: Critical
>
> When using the below configuration in ATSv2 reader,
> {noformat}
> yarn.timeline-service.read.authentication.enabled=true
> yarn.timeline-service.read.allowed.users=user1,user2{noformat}
> A query with user3 throws a Forbidden Exception with a status code of 500 
> (Internal Server Error) instead of the expected 403 for Forbidden. Stack 
> trace of the response,
> {noformat}
> HTTP ERROR 500
> Problem accessing /ws/v2/timeline/apps/application_1525427743175_0009. Reason:
> Server Error
> Caused by:
> org.apache.hadoop.yarn.webapp.ForbiddenException: java.lang.Exception: user 
> user3 is not allowed to read TimelineService V2 data
>   at 
> org.apache.hadoop.yarn.server.timelineservice.reader.security.TimelineReaderWhitelistAuthorizationFilter.doFilter(TimelineReaderWhitelistAuthorizationFilter.java:80)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at 
> org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1601)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: user user3 is not allowed to read 
> TimelineService V2 data
>   at 
> org.apache.hadoop.yarn.webapp.ForbiddenException.(ForbiddenException.java:41)
>   ... 34 more{noformat}
> cc [~vrushalic] [~rohithsharma]



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-8253) HTTPS Ats v2 api call fails with "bad HTTP parsed"

2018-05-07 Thread Charan Hebri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466903#comment-16466903
 ] 

Charan Hebri commented on YARN-8253:


[~rohithsharma] I can provide a fix if you haven't started.

> HTTPS Ats v2 api call fails with "bad HTTP parsed"
> --
>
> Key: YARN-8253
> URL: https://issues.apache.org/jira/browse/YARN-8253
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Rohith Sharma K S
>Priority: Major
>
> When Yarn http policy is set to Https_only, ATS v2 should use HTTPS address.
> Here, ATS v2 call is failing with below error.
> {code:java}
> [hrt_qa@xxx root]$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 
> 'Accept: application/json' --negotiate -u: 
> 'https://xxx:8199/ws/v2/timeline/apps/application_1525238789838_0003/entities/COMPONENT_INSTANCE?fields=ALL'
> [hrt_qa@xxx root]$ echo $?
> 35{code}
> {code:java|title=Ats v2}
> 2018-05-02 05:45:40,427 WARN  http.HttpParser (HttpParser.java:(1832)) 
> - Illegal character 0x16 in state=START for buffer 
> HeapByteBuffer@dba438[p=1,l=222,c=8192,r=221]={\x16<<<\x03\x01\x00\xD9\x01\x00\x00\xD5\x03\x03;X\xEd\xD1orq...\x01\x05\x01\x06\x01\x02\x01\x04\x02\x05\x02\x06\x02\x02\x02>>>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 2018-05-02 05:45:40,428 WARN  http.HttpParser 
> (HttpParser.java:parseNext(1435)) - bad HTTP parsed: 400 Illegal character 
> 0x16 for HttpChannelOverHttp@2efbda6c{r=0,c=false,a=IDLE,uri=null}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-05-07 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6091:
-
Target Version/s: 2.8.5  (was: 2.8.4)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-05-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466774#comment-16466774
 ] 

Junping Du commented on YARN-6091:
--

Move to 2.8.5 as 2.8.4 is in RC stage.

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8255) Allow option to disable flex for a service component

2018-05-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466857#comment-16466857
 ] 

Wangda Tan commented on YARN-8255:
--

[~eyang], 

Thanks for commenting, your suggestion makes sense, and has less dev/testing 
overhead. I think we can do as you suggested: allow flexing when restart-policy 
 = always / on-failure; and disallow flexing when restart-policy = never.

We can add a separate allow_flexing flag to spec if once we see solid 
requirements from users.

[~suma.shivaprasad], does this make sense to you, please feel free to share 
your opinions.

> Allow option to disable flex for a service component 
> -
>
> Key: YARN-8255
> URL: https://issues.apache.org/jira/browse/YARN-8255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>
> YARN-8080 implements restart capabilities for service component instances. 
> YARN service components should add an option to disallow flexing to support 
> workloads which are essentially batch/iterative jobs which terminate with 
> restart_policy=NEVER/ON_FAILURE. This could be disabled by default for 
> components where restart_policy=NEVER/ON_FAILURE and enabled by default when 
> restart_policy=ALWAYS(which is the default restart_policy) unless explicitly 
> set at the service spec.
> The option could be exposed as part of the component spec as "allow_flexing". 
> cc [~billie.rinaldi] [~gsaha] [~eyang] [~csingh] [~wangda]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7892) Revisit NodeAttribute class structure

2018-05-07 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466872#comment-16466872
 ] 

Bibin A Chundatt commented on YARN-7892:


[~Naganarasimha]

Can you also update  {{TestClientRMService}} in next patch.
TestClientRMService too many formatting done for non modified code .


> Revisit NodeAttribute class structure
> -
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch, 
> YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, 
> YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, 
> YARN-7892-YARN-3409.005.patch, YARN-7892-YARN-3409.006.patch, 
> YARN-7892-YARN-3409.007.patch
>
>
> In the existing structure, we had kept the type and value along with the 
> attribute which would create confusion to the user to understand the APIs as 
> they would not be clear as to what needs to be sent for type and value while 
> fetching the mappings for node(s).
> As well as equals will not make sense when we compare only for prefix and 
> name where as values for them might be different.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5151) [UI2] Support kill application from new YARN UI

2018-05-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466893#comment-16466893
 ] 

Hudson commented on YARN-5151:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14137 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14137/])
YARN-5151. [UI2] Support kill application from new YARN UI. Contributed 
(sunilg: rev 9832265e1deeefaa6d58f9f62052c8ef6a8e82b7)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/adapters/yarn-app.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-app.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-app.hbs


> [UI2] Support kill application from new YARN UI
> ---
>
> Key: YARN-5151
> URL: https://issues.apache.org/jira/browse/YARN-5151
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Gergely Novák
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-5151.001.patch, YARN-5151.002.patch, 
> YARN-5151.003.patch, YARN-5151.004.patch, YARN-5151.005.patch, 
> YARN-5151.007.patch, YARN-5151.008.patch, screenshot-1.png, screenshot-2.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8253) HTTPS Ats v2 api call fails with "bad HTTP parsed"

2018-05-07 Thread Charan Hebri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charan Hebri updated YARN-8253:
---
Attachment: YARN-8253.01.patch

> HTTPS Ats v2 api call fails with "bad HTTP parsed"
> --
>
> Key: YARN-8253
> URL: https://issues.apache.org/jira/browse/YARN-8253
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Charan Hebri
>Priority: Major
> Attachments: YARN-8253.01.patch
>
>
> When Yarn http policy is set to Https_only, ATS v2 should use HTTPS address.
> Here, ATS v2 call is failing with below error.
> {code:java}
> [hrt_qa@xxx root]$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 
> 'Accept: application/json' --negotiate -u: 
> 'https://xxx:8199/ws/v2/timeline/apps/application_1525238789838_0003/entities/COMPONENT_INSTANCE?fields=ALL'
> [hrt_qa@xxx root]$ echo $?
> 35{code}
> {code:java|title=Ats v2}
> 2018-05-02 05:45:40,427 WARN  http.HttpParser (HttpParser.java:(1832)) 
> - Illegal character 0x16 in state=START for buffer 
> HeapByteBuffer@dba438[p=1,l=222,c=8192,r=221]={\x16<<<\x03\x01\x00\xD9\x01\x00\x00\xD5\x03\x03;X\xEd\xD1orq...\x01\x05\x01\x06\x01\x02\x01\x04\x02\x05\x02\x06\x02\x02\x02>>>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 2018-05-02 05:45:40,428 WARN  http.HttpParser 
> (HttpParser.java:parseNext(1435)) - bad HTTP parsed: 400 Illegal character 
> 0x16 for HttpChannelOverHttp@2efbda6c{r=0,c=false,a=IDLE,uri=null}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8175) Add support for Node Labels in SLS.

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466629#comment-16466629
 ] 

genericqa commented on YARN-8175:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m  6s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.sls.TestSLSRunner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8175 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922287/YARN-8175.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0b68850f5612 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 696a4be |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20618/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20618/testReport/ |
| Max. process+thread count | 457 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20618/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add 

[jira] [Updated] (YARN-7892) Revisit NodeAttribute class structure

2018-05-07 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-7892:

Attachment: YARN-7892-YARN-3409.007.patch

> Revisit NodeAttribute class structure
> -
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch, 
> YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, 
> YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, 
> YARN-7892-YARN-3409.005.patch, YARN-7892-YARN-3409.006.patch, 
> YARN-7892-YARN-3409.007.patch
>
>
> In the existing structure, we had kept the type and value along with the 
> attribute which would create confusion to the user to understand the APIs as 
> they would not be clear as to what needs to be sent for type and value while 
> fetching the mappings for node(s).
> As well as equals will not make sense when we compare only for prefix and 
> name where as values for them might be different.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8254) dynamically change log levels for YARN Jobs

2018-05-07 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466853#comment-16466853
 ] 

Prabhu Joseph commented on YARN-8254:
-

YarnClient can request setLogLevel for an application using a new api "yarn 
application -setLogLevel   " to RM. ResourceManager will 
pass it to ApplicationMaster through AllocateResponse.

ApplicationMaster will process the logLevel and pass it to all the task 
containers as part of the response to statusUpdate. Needs change in each 
application to support this or can simply ignore.

> dynamically change log levels for YARN Jobs
> ---
>
> Key: YARN-8254
> URL: https://issues.apache.org/jira/browse/YARN-8254
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Priority: Major
>  Labels: supportability
>
> Currently the Log Levels for Daemons can be dynamically changed. It will be 
> easier while debugging to have same for YARN Jobs. Client can setLogLevel to 
> ApplicationMaster which can set it for all the containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8253) HTTPS Ats v2 api call fails with "bad HTTP parsed"

2018-05-07 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466915#comment-16466915
 ] 

Rohith Sharma K S commented on YARN-8253:
-

+1, pending jenkins

> HTTPS Ats v2 api call fails with "bad HTTP parsed"
> --
>
> Key: YARN-8253
> URL: https://issues.apache.org/jira/browse/YARN-8253
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Charan Hebri
>Priority: Major
> Attachments: YARN-8253.01.patch
>
>
> When Yarn http policy is set to Https_only, ATS v2 should use HTTPS address.
> Here, ATS v2 call is failing with below error.
> {code:java}
> [hrt_qa@xxx root]$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 
> 'Accept: application/json' --negotiate -u: 
> 'https://xxx:8199/ws/v2/timeline/apps/application_1525238789838_0003/entities/COMPONENT_INSTANCE?fields=ALL'
> [hrt_qa@xxx root]$ echo $?
> 35{code}
> {code:java|title=Ats v2}
> 2018-05-02 05:45:40,427 WARN  http.HttpParser (HttpParser.java:(1832)) 
> - Illegal character 0x16 in state=START for buffer 
> HeapByteBuffer@dba438[p=1,l=222,c=8192,r=221]={\x16<<<\x03\x01\x00\xD9\x01\x00\x00\xD5\x03\x03;X\xEd\xD1orq...\x01\x05\x01\x06\x01\x02\x01\x04\x02\x05\x02\x06\x02\x02\x02>>>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 2018-05-02 05:45:40,428 WARN  http.HttpParser 
> (HttpParser.java:parseNext(1435)) - bad HTTP parsed: 400 Illegal character 
> 0x16 for HttpChannelOverHttp@2efbda6c{r=0,c=false,a=IDLE,uri=null}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466641#comment-16466641
 ] 

genericqa commented on YARN-8242:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 28 new + 146 unchanged - 3 fixed = 174 total (was 149) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 53s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8242 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922351/YARN-8242.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ddd104c5ab3c 3.13.0-141-generic #190-Ubuntu SMP Fri Jan 19 
12:52:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 696a4be |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20619/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 

[jira] [Assigned] (YARN-8236) Invalid kerberos principal file name cause NPE in native service

2018-05-07 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha reassigned YARN-8236:
---

Assignee: Gour Saha

> Invalid kerberos principal file name cause NPE in native service
> 
>
> Key: YARN-8236
> URL: https://issues.apache.org/jira/browse/YARN-8236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
>
> Stack trace
>  
> {code:java}
> 2018-04-29 16:22:54,266 WARN webapp.GenericExceptionHandler 
> (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.addKeytabResourceIfSecure(ServiceClient.java:994)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.submitApp(ServiceClient.java:685)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:269){code}
> cc [~gsaha] [~csingh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8236) Invalid kerberos principal file name cause NPE in native service

2018-05-07 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8236:

Attachment: YARN-8236.01.patch

> Invalid kerberos principal file name cause NPE in native service
> 
>
> Key: YARN-8236
> URL: https://issues.apache.org/jira/browse/YARN-8236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8236.01.patch
>
>
> Stack trace
>  
> {code:java}
> 2018-04-29 16:22:54,266 WARN webapp.GenericExceptionHandler 
> (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.addKeytabResourceIfSecure(ServiceClient.java:994)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.submitApp(ServiceClient.java:685)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:269){code}
> cc [~gsaha] [~csingh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8236) Invalid kerberos principal file name cause NPE in native service

2018-05-07 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466677#comment-16466677
 ] 

Gour Saha commented on YARN-8236:
-

[~sunilg], I attached a patch to fix this. Please review.

> Invalid kerberos principal file name cause NPE in native service
> 
>
> Key: YARN-8236
> URL: https://issues.apache.org/jira/browse/YARN-8236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8236.01.patch
>
>
> Stack trace
>  
> {code:java}
> 2018-04-29 16:22:54,266 WARN webapp.GenericExceptionHandler 
> (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.addKeytabResourceIfSecure(ServiceClient.java:994)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.submitApp(ServiceClient.java:685)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:269){code}
> cc [~gsaha] [~csingh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7892) Revisit NodeAttribute class structure

2018-05-07 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466712#comment-16466712
 ] 

Naganarasimha G R commented on YARN-7892:
-

Thanks for the comments [~leftnoteasy], I have modified as per your suggestions

[~sunilg] & [~bibinchundatt],

Can you please check on the other part of the latest uploaded patch ?

> Revisit NodeAttribute class structure
> -
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch, 
> YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, 
> YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, 
> YARN-7892-YARN-3409.005.patch, YARN-7892-YARN-3409.006.patch, 
> YARN-7892-YARN-3409.007.patch
>
>
> In the existing structure, we had kept the type and value along with the 
> attribute which would create confusion to the user to understand the APIs as 
> they would not be clear as to what needs to be sent for type and value while 
> fetching the mappings for node(s).
> As well as equals will not make sense when we compare only for prefix and 
> name where as values for them might be different.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8257) Native service should automatically adding escapes for environment/launch cmd before sending to YARN

2018-05-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466634#comment-16466634
 ] 

Wangda Tan commented on YARN-8257:
--

Talked to [~gsaha], and [~gsaha] mentioned he will help if get chance. :)

cc: [~sunilg]

> Native service should automatically adding escapes for environment/launch cmd 
> before sending to YARN
> 
>
> Key: YARN-8257
> URL: https://issues.apache.org/jira/browse/YARN-8257
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Gour Saha
>Priority: Critical
>
> Noticed this issue while using native service: 
> Basically, when a string for environment / launch command contains chars like 
> ", /, `: it needs to be escaped twice.
> The first time is from json spec, because of json accept double quote only, 
> it needs an escape.
> The second time is from launch container, what we did for command line is: 
> (ContainerLaunch.java)
> {code:java}
> line("exec /bin/bash -c \"", StringUtils.join(" ", command), "\"");{code}
> And for environment:
> {code:java}
> line("export ", key, "=\"", value, "\"");{code}
> An example of launch_command: 
> {code:java}
> "launch_command": "export CLASSPATH=\\`\\$HADOOP_HDFS_HOME/bin/hadoop 
> classpath --glob\\`"{code}
> And example of environment:
> {code:java}
> "TF_CONFIG" : "{\\\"cluster\\\": {\\\"master\\\": 
> [\\\"master-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"], \\\"ps\\\": 
> [\\\"ps-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"], \\\"worker\\\": 
> [\\\"worker-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"]}, 
> \\\"task\\\": {\\\"type\\\":\\\"${COMPONENT_NAME}\\\", 
> \\\"index\\\":${COMPONENT_ID}}, \\\"environment\\\":\\\"cloud\\\"}",{code}
> To improve usability, I think we should auto escape the input string once. 
> (For example, if user specified 
> {code}
> "TF_CONFIG": "\"key\""
> {code}
> We will automatically escape it to:
> {code}
> "TF_CONFIG": \\\"key\\\"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8257) Native service should automatically adding escapes for environment/launch cmd before sending to YARN

2018-05-07 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8257:


 Summary: Native service should automatically adding escapes for 
environment/launch cmd before sending to YARN
 Key: YARN-8257
 URL: https://issues.apache.org/jira/browse/YARN-8257
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Wangda Tan
Assignee: Gour Saha


Noticed this issue while using native service: 

Basically, when a string for environment / launch command contains chars like 
", /, `: it needs to be escaped twice.

The first time is from json spec, because of json accept double quote only, it 
needs an escape.

The second time is from launch container, what we did for command line is: 
(ContainerLaunch.java)
{code:java}
line("exec /bin/bash -c \"", StringUtils.join(" ", command), "\"");{code}
And for environment:
{code:java}
line("export ", key, "=\"", value, "\"");{code}
An example of launch_command: 
{code:java}
"launch_command": "export CLASSPATH=\\`\\$HADOOP_HDFS_HOME/bin/hadoop classpath 
--glob\\`"{code}
And example of environment:
{code:java}
"TF_CONFIG" : "{\\\"cluster\\\": {\\\"master\\\": 
[\\\"master-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"], \\\"ps\\\": 
[\\\"ps-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"], \\\"worker\\\": 
[\\\"worker-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"]}, 
\\\"task\\\": {\\\"type\\\":\\\"${COMPONENT_NAME}\\\", 
\\\"index\\\":${COMPONENT_ID}}, \\\"environment\\\":\\\"cloud\\\"}",{code}

To improve usability, I think we should auto escape the input string once. (For 
example, if user specified 
{code}
"TF_CONFIG": "\"key\""
{code}
We will automatically escape it to:
{code}
"TF_CONFIG": \\\"key\\\"
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466837#comment-16466837
 ] 

genericqa commented on YARN-4599:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
30m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
34s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 25m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 25m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 25m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
42s{color} | {color:green} root: The patch generated 0 new + 211 unchanged - 1 
fixed = 211 total (was 212) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 59s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}150m  3s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}333m 54s{color} | 

[jira] [Updated] (YARN-8236) Invalid kerberos principal file name cause NPE in native service

2018-05-07 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8236:

Target Version/s: 3.1.1

> Invalid kerberos principal file name cause NPE in native service
> 
>
> Key: YARN-8236
> URL: https://issues.apache.org/jira/browse/YARN-8236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
>
> Stack trace
>  
> {code:java}
> 2018-04-29 16:22:54,266 WARN webapp.GenericExceptionHandler 
> (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.addKeytabResourceIfSecure(ServiceClient.java:994)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.submitApp(ServiceClient.java:685)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:269){code}
> cc [~gsaha] [~csingh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8236) Invalid kerberos principal file name cause NPE in native service

2018-05-07 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8236:

Fix Version/s: 3.1.1
   3.2.0

> Invalid kerberos principal file name cause NPE in native service
> 
>
> Key: YARN-8236
> URL: https://issues.apache.org/jira/browse/YARN-8236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
>
> Stack trace
>  
> {code:java}
> 2018-04-29 16:22:54,266 WARN webapp.GenericExceptionHandler 
> (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.addKeytabResourceIfSecure(ServiceClient.java:994)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.submitApp(ServiceClient.java:685)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:269){code}
> cc [~gsaha] [~csingh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8201) Skip stacktrace of ApplicationNotFoundException at server side

2018-05-07 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466879#comment-16466879
 ] 

Bibin A Chundatt commented on YARN-8201:


 

+1 lgtm.

Will commit it later today.

> Skip stacktrace of ApplicationNotFoundException at server side
> --
>
> Key: YARN-8201
> URL: https://issues.apache.org/jira/browse/YARN-8201
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-8201-001.patch, YARN-8201-002.patch, 
> YARN-8201-003.patch
>
>
> Currently full stack trace of exception like 
> ApplicationNotFoundException,ApplicationAttemptNotFoundException etc  are 
> logged at server side..Wrong client operation could increase server logs.
> {{Server.addTerseExceptions}} could be used to reduce server side logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8253) HTTPS Ats v2 api call fails with "bad HTTP parsed"

2018-05-07 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-8253:
---

Assignee: Charan Hebri  (was: Rohith Sharma K S)

Assigned to you.

> HTTPS Ats v2 api call fails with "bad HTTP parsed"
> --
>
> Key: YARN-8253
> URL: https://issues.apache.org/jira/browse/YARN-8253
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Charan Hebri
>Priority: Major
>
> When Yarn http policy is set to Https_only, ATS v2 should use HTTPS address.
> Here, ATS v2 call is failing with below error.
> {code:java}
> [hrt_qa@xxx root]$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 
> 'Accept: application/json' --negotiate -u: 
> 'https://xxx:8199/ws/v2/timeline/apps/application_1525238789838_0003/entities/COMPONENT_INSTANCE?fields=ALL'
> [hrt_qa@xxx root]$ echo $?
> 35{code}
> {code:java|title=Ats v2}
> 2018-05-02 05:45:40,427 WARN  http.HttpParser (HttpParser.java:(1832)) 
> - Illegal character 0x16 in state=START for buffer 
> HeapByteBuffer@dba438[p=1,l=222,c=8192,r=221]={\x16<<<\x03\x01\x00\xD9\x01\x00\x00\xD5\x03\x03;X\xEd\xD1orq...\x01\x05\x01\x06\x01\x02\x01\x04\x02\x05\x02\x06\x02\x02\x02>>>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 2018-05-02 05:45:40,428 WARN  http.HttpParser 
> (HttpParser.java:parseNext(1435)) - bad HTTP parsed: 400 Illegal character 
> 0x16 for HttpChannelOverHttp@2efbda6c{r=0,c=false,a=IDLE,uri=null}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7894) Improve ATS response for DS_CONTAINER when container launch fails

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466653#comment-16466653
 ] 

genericqa commented on YARN-7894:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 45s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:
 The patch generated 2 new + 114 unchanged - 0 fixed = 116 total (was 114) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 11s{color} 
| {color:red} hadoop-yarn-applications-distributedshell in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-7894 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922349/YARN-7894.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a0e4f5106b40 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 696a4be |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20616/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-distributedshell.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20616/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-distributedshell.txt
 |
|  Test Results | 

[jira] [Commented] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-07 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466716#comment-16466716
 ] 

Eric Yang commented on YARN-8207:
-

[~jlowe] Patch 008 fixed the issues discovered except char array copy.  There 
is approximately 900kb leaks in container-executor prior to this patch, and we 
saved 20kb from leaking base on valgrind report exercising test cases.  Execvp 
will wipe out all the leaks anyhow.  Unless we find more of the buffer overflow 
problems.  I am going to stop styling code changes because styling change has 
diminished return of investment at this point.

> Docker container launch use popen have risk of shell expansion
> --
>
> Key: YARN-8207
> URL: https://issues.apache.org/jira/browse/YARN-8207
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8207.001.patch, YARN-8207.002.patch, 
> YARN-8207.003.patch, YARN-8207.004.patch, YARN-8207.005.patch, 
> YARN-8207.006.patch, YARN-8207.007.patch, YARN-8207.008.patch
>
>
> Container-executor code utilize a string buffer to construct docker run 
> command, and pass the string buffer to popen for execution.  Popen spawn a 
> shell to run the command.  Some arguments for docker run are still vulnerable 
> to shell expansion.  The possible solution is to convert from char * buffer 
> to string array for execv to avoid shell expansion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8255) Allow option to disable flex for a service component

2018-05-07 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466800#comment-16466800
 ] 

Eric Yang commented on YARN-8255:
-

[~leftnoteasy] Recompute and expandable are intertwined.  They are the same 
thing.  At conceptual level, teragen has no dependency of input format.  You 
can add more partitions to get more data generated.  Hadoop's own 
implementation limited this from happening, but this does not mean docker 
containers should be imposed by the same initialization time limitation.  On 
the other hand, we must optimize the framework for general purpose usage and 
prevent ourselves from giving too many untested and unsupported options.  I 
think it make sense to reduce the flex options to 2 main types instead of 
giving all 6 options.

> Allow option to disable flex for a service component 
> -
>
> Key: YARN-8255
> URL: https://issues.apache.org/jira/browse/YARN-8255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>
> YARN-8080 implements restart capabilities for service component instances. 
> YARN service components should add an option to disallow flexing to support 
> workloads which are essentially batch/iterative jobs which terminate with 
> restart_policy=NEVER/ON_FAILURE. This could be disabled by default for 
> components where restart_policy=NEVER/ON_FAILURE and enabled by default when 
> restart_policy=ALWAYS(which is the default restart_policy) unless explicitly 
> set at the service spec.
> The option could be exposed as part of the component spec as "allow_flexing". 
> cc [~billie.rinaldi] [~gsaha] [~eyang] [~csingh] [~wangda]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8255) Allow option to disable flex for a service component

2018-05-07 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466800#comment-16466800
 ] 

Eric Yang edited comment on YARN-8255 at 5/8/18 3:08 AM:
-

[~leftnoteasy] Recompute and expandable are intertwined.  They are not the same 
thing.  At conceptual level, teragen has no dependency of input format.  You 
can add more partitions to get more data generated.  Hadoop's own 
implementation limited this from happening, but this does not mean docker 
containers should be imposed by the same initialization time limitation.  On 
the other hand, we must optimize the framework for general purpose usage and 
prevent ourselves from giving too many untested and unsupported options.  I 
think it make sense to reduce the flex options to 2 main types instead of 
giving all 6 options.


was (Author: eyang):
[~leftnoteasy] Recompute and expandable are intertwined.  They are the same 
thing.  At conceptual level, teragen has no dependency of input format.  You 
can add more partitions to get more data generated.  Hadoop's own 
implementation limited this from happening, but this does not mean docker 
containers should be imposed by the same initialization time limitation.  On 
the other hand, we must optimize the framework for general purpose usage and 
prevent ourselves from giving too many untested and unsupported options.  I 
think it make sense to reduce the flex options to 2 main types instead of 
giving all 6 options.

> Allow option to disable flex for a service component 
> -
>
> Key: YARN-8255
> URL: https://issues.apache.org/jira/browse/YARN-8255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>
> YARN-8080 implements restart capabilities for service component instances. 
> YARN service components should add an option to disallow flexing to support 
> workloads which are essentially batch/iterative jobs which terminate with 
> restart_policy=NEVER/ON_FAILURE. This could be disabled by default for 
> components where restart_policy=NEVER/ON_FAILURE and enabled by default when 
> restart_policy=ALWAYS(which is the default restart_policy) unless explicitly 
> set at the service spec.
> The option could be exposed as part of the component spec as "allow_flexing". 
> cc [~billie.rinaldi] [~gsaha] [~eyang] [~csingh] [~wangda]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7715) Update CPU and Memory cgroups params on container update as well.

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466631#comment-16466631
 ] 

genericqa commented on YARN-7715:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 33s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 32s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsCpuResourceHandlerImpl
 |
|   | 
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestCGroupsMemoryResourceHandlerImpl
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-7715 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922306/YARN-7715.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 92f2feaf42b8 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 
21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 696a4be |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20620/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20620/testReport/ |
| Max. process+thread count | 412 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-8257) Native service should automatically adding escapes for environment/launch cmd before sending to YARN

2018-05-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466656#comment-16466656
 ] 

Wangda Tan commented on YARN-8257:
--

Just took a closer look: 

Since both of environment/launch command will be written to a shell script and 
intercepted by bash, we need to consider following chars should be escaped (add 
a \ before them)
{code:java}
` : execute a command
$ : reference to environment
\ : all other escapes
" : double quotes{code}
Reference: 

[https://superuser.com/questions/163515/bash-how-to-pass-command-line-arguments-containing-special-characters]
 (search "per man bash")

> Native service should automatically adding escapes for environment/launch cmd 
> before sending to YARN
> 
>
> Key: YARN-8257
> URL: https://issues.apache.org/jira/browse/YARN-8257
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Gour Saha
>Priority: Critical
>
> Noticed this issue while using native service: 
> Basically, when a string for environment / launch command contains chars like 
> ", /, `: it needs to be escaped twice.
> The first time is from json spec, because of json accept double quote only, 
> it needs an escape.
> The second time is from launch container, what we did for command line is: 
> (ContainerLaunch.java)
> {code:java}
> line("exec /bin/bash -c \"", StringUtils.join(" ", command), "\"");{code}
> And for environment:
> {code:java}
> line("export ", key, "=\"", value, "\"");{code}
> An example of launch_command: 
> {code:java}
> "launch_command": "export CLASSPATH=\\`\\$HADOOP_HDFS_HOME/bin/hadoop 
> classpath --glob\\`"{code}
> And example of environment:
> {code:java}
> "TF_CONFIG" : "{\\\"cluster\\\": {\\\"master\\\": 
> [\\\"master-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"], \\\"ps\\\": 
> [\\\"ps-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"], \\\"worker\\\": 
> [\\\"worker-0.distributed-tf.ambari-qa.tensorflow.site:8000\\\"]}, 
> \\\"task\\\": {\\\"type\\\":\\\"${COMPONENT_NAME}\\\", 
> \\\"index\\\":${COMPONENT_ID}}, \\\"environment\\\":\\\"cloud\\\"}",{code}
> To improve usability, I think we should auto escape the input string once. 
> (For example, if user specified 
> {code}
> "TF_CONFIG": "\"key\""
> {code}
> We will automatically escape it to:
> {code}
> "TF_CONFIG": \\\"key\\\"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-07 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8207:

Attachment: YARN-8207.008.patch

> Docker container launch use popen have risk of shell expansion
> --
>
> Key: YARN-8207
> URL: https://issues.apache.org/jira/browse/YARN-8207
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8207.001.patch, YARN-8207.002.patch, 
> YARN-8207.003.patch, YARN-8207.004.patch, YARN-8207.005.patch, 
> YARN-8207.006.patch, YARN-8207.007.patch, YARN-8207.008.patch
>
>
> Container-executor code utilize a string buffer to construct docker run 
> command, and pass the string buffer to popen for execution.  Popen spawn a 
> shell to run the command.  Some arguments for docker run are still vulnerable 
> to shell expansion.  The possible solution is to convert from char * buffer 
> to string array for execv to avoid shell expansion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8254) dynamically change log levels for YARN Jobs

2018-05-07 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-8254:

Component/s: yarn

> dynamically change log levels for YARN Jobs
> ---
>
> Key: YARN-8254
> URL: https://issues.apache.org/jira/browse/YARN-8254
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Priority: Major
>  Labels: supportability
>
> Currently the Log Levels for Daemons can be dynamically changed. It will be 
> easier while debugging to have same for YARN Jobs. Client can setLogLevel to 
> ApplicationMaster which can set it for all the containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8254) dynamically change log levels for YARN Jobs

2018-05-07 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466767#comment-16466767
 ] 

Prabhu Joseph commented on YARN-8254:
-

[~Naganarasimha] Just realized this is Application specific. AM has to provide 
support to change log level to client. JobClient can request setLogLevel to AM. 
AM will internally setLogLevel for all running containers. Will move this Jira 
to MapReduce.

> dynamically change log levels for YARN Jobs
> ---
>
> Key: YARN-8254
> URL: https://issues.apache.org/jira/browse/YARN-8254
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Priority: Major
>  Labels: supportability
>
> Currently the Log Levels for Daemons can be dynamically changed. It will be 
> easier while debugging to have same for YARN Jobs. Client can setLogLevel to 
> ApplicationMaster which can set it for all the containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8254) dynamically change log levels for YARN Jobs

2018-05-07 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-8254:

Component/s: (was: yarn)

> dynamically change log levels for YARN Jobs
> ---
>
> Key: YARN-8254
> URL: https://issues.apache.org/jira/browse/YARN-8254
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Priority: Major
>  Labels: supportability
>
> Currently the Log Levels for Daemons can be dynamically changed. It will be 
> easier while debugging to have same for YARN Jobs. Client can setLogLevel to 
> ApplicationMaster which can set it for all the containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-07 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466771#comment-16466771
 ] 

genericqa commented on YARN-8207:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
38m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 20s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8207 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922372/YARN-8207.008.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 8e11315a16fd 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 696a4be |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/20624/artifact/out/whitespace-tabs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20624/testReport/ |
| Max. process+thread count | 327 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20624/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Docker container launch use popen have risk of shell expansion
> --
>
> Key: YARN-8207
> URL: https://issues.apache.org/jira/browse/YARN-8207
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-8207.001.patch, YARN-8207.002.patch, 
> YARN-8207.003.patch, YARN-8207.004.patch, YARN-8207.005.patch, 
> YARN-8207.006.patch, YARN-8207.007.patch, 

[jira] [Updated] (YARN-5151) [UI2] Support kill application from new YARN UI

2018-05-07 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-5151:
--
Summary: [UI2] Support kill application from new YARN UI  (was: [YARN-3368] 
Support kill application from new YARN UI)

> [UI2] Support kill application from new YARN UI
> ---
>
> Key: YARN-5151
> URL: https://issues.apache.org/jira/browse/YARN-5151
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Gergely Novák
>Priority: Major
> Attachments: YARN-5151.001.patch, YARN-5151.002.patch, 
> YARN-5151.003.patch, YARN-5151.004.patch, YARN-5151.005.patch, 
> YARN-5151.007.patch, YARN-5151.008.patch, screenshot-1.png, screenshot-2.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-07 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466315#comment-16466315
 ] 

Billie Rinaldi commented on YARN-8080:
--

Another thing we should address is the concept of readiness for ON_FAILURE / 
NEVER component instances. It seems like instances of these types shouldn't 
become READY unless they have succeeded. Possibly this check could be added 
into the default readiness check.

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8180) YARN Federation has not implemented blacklist sub-cluster for AM routing

2018-05-07 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-8180:
---

Assignee: Abhishek Modi

> YARN Federation has not implemented blacklist sub-cluster for AM routing
> 
>
> Key: YARN-8180
> URL: https://issues.apache.org/jira/browse/YARN-8180
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Reporter: Shen Yinjie
>Assignee: Abhishek Modi
>Priority: Major
>
> Property "yarn.federation.blacklist-subclusters" is defined in 
> yarn-fedeartion doc,but it has not been defined and implemented in Java code.
> In FederationClientInterceptor#submitApplication()
> {code:java}
> List blacklist = new ArrayList();
> for (int i = 0; i < numSubmitRetries; ++i) {
> SubClusterId subClusterId = policyFacade.getHomeSubcluster(
> request.getApplicationSubmissionContext(), blacklist);
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466136#comment-16466136
 ] 

Eric Payne commented on YARN-4606:
--

Thanks [~maniraj...@gmail.com] for your consistent and continuing efforts to 
fix this problem.

I am doing an in-depth review, but I would like to address a few things first 
regarding method names and comments. I feel that it is important to be accurate 
in these areas in order to eliminate confusion for those maintaining this code.

- All occurrences of "atleast" should be "at least"
- Comment for {{AbstractUsersManager#getNumActiveUsers}}:
{code:title=AbstractUsersManager#getNumActiveUsers}
-   * Get number of active users i.e. users with applications which have pending
-   * resource requests.
+   * Get number of active users i.e. users with atleast 1 active applications
{code}
For this comment, I would say "Get number of active users i.e. users with at 
least 1 running application and and applications requesting resources"
- I would prefer it if the name of {{ActiveUsersOfPendingApps}} was changed 
everywhere to {{ActiveUsersWithOnlyPendingApps}}. This is kind of a nit, but I 
do feel that the rename would be more descriptive.
- {{AbstractUsersManager#incrNumActiveUsersOfPendingApps}}, 
{{decrNumActiveUsersOfPendingApps}}, and {{getNumActiveUsersOfPendingApps}}
Change description to "number of users with only pending apps"
- {{UsersManager#activateApplication}} and {{deactivateApplication}}
Change "Active users which has atleast 1 pending apps:" to "Active users which 
have at least 1 pending app:"


> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8253) HTTPS Ats v2 api call fails with "bad HTTP parsed"

2018-05-07 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8253:


 Summary: HTTPS Ats v2 api call fails with "bad HTTP parsed"
 Key: YARN-8253
 URL: https://issues.apache.org/jira/browse/YARN-8253
 Project: Hadoop YARN
  Issue Type: Bug
  Components: ATSv2
Affects Versions: 3.1.0
Reporter: Yesha Vora


When Yarn http policy is set to Https_only, ATS v2 should use HTTPS address.

Here, ATS v2 call is failing with below error.
{code:java}
[hrt_qa@xxx root]$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 
'Accept: application/json' --negotiate -u: 
'https://xxx:8199/ws/v2/timeline/apps/application_1525238789838_0003/entities/COMPONENT_INSTANCE?fields=ALL'

[hrt_qa@xxx root]$ echo $?

35{code}
{code:java|title=Ats v2}
2018-05-02 05:45:40,427 WARN  http.HttpParser (HttpParser.java:(1832)) - 
Illegal character 0x16 in state=START for buffer 
HeapByteBuffer@dba438[p=1,l=222,c=8192,r=221]={\x16<<<\x03\x01\x00\xD9\x01\x00\x00\xD5\x03\x03;X\xEd\xD1orq...\x01\x05\x01\x06\x01\x02\x01\x04\x02\x05\x02\x06\x02\x02\x02>>>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
2018-05-02 05:45:40,428 WARN  http.HttpParser (HttpParser.java:parseNext(1435)) 
- bad HTTP parsed: 400 Illegal character 0x16 for 
HttpChannelOverHttp@2efbda6c{r=0,c=false,a=IDLE,uri=null}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-07 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466169#comment-16466169
 ] 

Billie Rinaldi commented on YARN-8080:
--

Thanks for taking this up, [~suma.shivaprasad]! This is an exciting new 
capability for the YARN service framework. Here are some comments on patch 7:
 * the terminateServiceIfAllComponentsFinished method and terminationHandler 
should be moved to ServiceScheduler
 * there is a typo in the getSuceededInstances method name where succeeded is 
written as suceeded (also the same typo appears in one comment in 
ComponentInstance)
 * I think it would be helpful to reuse the full description of restart_policy 
in YarnServiceAPI.md: "Policy of restart component. Including ALWAYS (Always 
restart component even if instance exit code = 0); ON_FAILURE (Only restart 
component if instance exit code != 0); NEVER (Do not restart in any cases)."
 * please address the unit test and checkstyle issues

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8175) Add support for Node Labels in SLS.

2018-05-07 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8175:

Attachment: YARN-8175.004.patch

> Add support for Node Labels in SLS.
> ---
>
> Key: YARN-8175
> URL: https://issues.apache.org/jira/browse/YARN-8175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8175.001.patch, YARN-8175.002.patch, 
> YARN-8175.003.patch, YARN-8175.004.patch
>
>
> Currently, SLS doesn't support node labels. With this jira, we are planning 
> to add support for node labels in SLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-05-07 Thread Jeff Kubina (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466242#comment-16466242
 ] 

Jeff Kubina commented on YARN-8242:
---

{quote}A similar approach can be used for other recovered lists like 
application state, localized resources, etc. if it's worth it for those as well.
{quote}
+1 to doing the above. Due to a configuration error in our Hadoop system and a 
bug in a YARN job we had millions of empty files created in the local mapred 
director (.../mapred/local/usercache_DEL_...). We did a heap dump of one of the 
nodemanagers and it looked like the NM was reading in all the files and/or 
directories to delete. We were able to get a few of them to come up by 
increasing their memory (Xmx) to 36gb but the ultimate fix was to delete the 
contents of .../mapred/local. 

> YARN NM: OOM error while reading back the state store on recovery
> -
>
> Key: YARN-8242
> URL: https://issues.apache.org/jira/browse/YARN-8242
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Kanwaljeet Sachdev
>Assignee: Kanwaljeet Sachdev
>Priority: Blocker
> Attachments: YARN-8242.001.patch, YARN-8242.002.patch
>
>
> On startup the NM reads its state store and builds a list of application in 
> the state store to process. If the number of applications in the state store 
> is large and have a lot of "state" connected to it the NM can run OOM and 
> never get to the point that it can start processing the recovery.
> Since it never starts the recovery there is no way for the NM to ever pass 
> this point. It will require a change in heap size to get the NM started.
>  
> Following is the stack trace
> {code:java}
> at java.lang.OutOfMemoryError. (OutOfMemoryError.java:48) at 
> com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at 
> com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47069) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47014) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47102) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:41016) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:40942) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41080) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24517) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24464) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24568) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24563) at 
> com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) 
> at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom
>  (YarnServiceProtos.java:24739) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState
>  (NMLeveldbStateStoreService.java:217) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState
>  (NMLeveldbStateStoreService.java:170) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover
>  (ContainerManagerImpl.java:253) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit
>  (ContainerManagerImpl.java:237) at 
> org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at 
> org.apache.hadoop.service.CompositeService.serviceInit 
> (CompositeService.java:107) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit 
> 

[jira] [Created] (YARN-8255) Allow option to disable flex for a service component

2018-05-07 Thread Suma Shivaprasad (JIRA)
Suma Shivaprasad created YARN-8255:
--

 Summary: Allow option to disable flex for a service component 
 Key: YARN-8255
 URL: https://issues.apache.org/jira/browse/YARN-8255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad


YARN-8080 implements restart capabilities for service component instances. YARN 
service components should add an option to disallow flexing to support 
workloads which are essentially batch/iterative jobs which terminate with 
restart_policy=NEVER/ON_FAILURE. This could be disabled by default for 
components where restart_policy=NEVER/ON_FAILURE and enabled by default when 
restart_policy=ALWAYS(which is the default restart_policy) unless explicitly 
set at the service spec.

The option could be exposed as part of the component spec as "allow_flexing". 

cc [~billie.rinaldi] [~gsaha] [~eyang] [~csingh] [~wangda]





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8248) Job hangs when queue is specified and that queue has 0 capability of a resource

2018-05-07 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-8248:
-
Component/s: yarn
 fairscheduler

> Job hangs when queue is specified and that queue has 0 capability of a 
> resource
> ---
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot server the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8255) Allow option to disable flex for a service component

2018-05-07 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466268#comment-16466268
 ] 

Billie Rinaldi commented on YARN-8255:
--

I'm not sure this configuration parameter is necessary. Only the launching user 
can flex the service, so this user should know whether flexing makes sense for 
components of their service. I am also not sure about having different defaults 
for different policies; it seems like this will be confusing and require 
complex documentation.

> Allow option to disable flex for a service component 
> -
>
> Key: YARN-8255
> URL: https://issues.apache.org/jira/browse/YARN-8255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>
> YARN-8080 implements restart capabilities for service component instances. 
> YARN service components should add an option to disallow flexing to support 
> workloads which are essentially batch/iterative jobs which terminate with 
> restart_policy=NEVER/ON_FAILURE. This could be disabled by default for 
> components where restart_policy=NEVER/ON_FAILURE and enabled by default when 
> restart_policy=ALWAYS(which is the default restart_policy) unless explicitly 
> set at the service spec.
> The option could be exposed as part of the component spec as "allow_flexing". 
> cc [~billie.rinaldi] [~gsaha] [~eyang] [~csingh] [~wangda]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7894) Improve ATS response for DS_CONTAINER when container launch fails

2018-05-07 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-7894:

Attachment: YARN-7894.003.patch

> Improve ATS response for DS_CONTAINER when container launch fails
> -
>
> Key: YARN-7894
> URL: https://issues.apache.org/jira/browse/YARN-7894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Charan Hebri
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-7894.001.patch, YARN-7894.002.patch, 
> YARN-7894.003.patch
>
>
> When a distributed shell application starts running and a container launch 
> fails the web service call to the API,
> {noformat}
> http:// address>/ws/v1/timeline/DS_CONTAINER/{noformat}
> return a "Not Found". The message returned in this case should be improved to 
> signify that a container launch failed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7715) Update CPU and Memory cgroups params on container update as well.

2018-05-07 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466274#comment-16466274
 ] 

Miklos Szegedi commented on YARN-7715:
--

Thank you for the review [~haibochen].

I added a unit test for TestContainerSchedulerQueuing.

I am hesitant to update or not update cgroups based on some hash maps updated 
by asynchronous code. That might become a supportability nightmare. I already 
added a proper check of running containers by checking whether the cgroup 
directory exists.

> Update CPU and Memory cgroups params on container update as well.
> -
>
> Key: YARN-7715
> URL: https://issues.apache.org/jira/browse/YARN-7715
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: YARN-7715.000.patch, YARN-7715.001.patch, 
> YARN-7715.002.patch
>
>
> In YARN-6673 and YARN-6674, the cgroups resource handlers update the cgroups 
> params for the containers, based on opportunistic or guaranteed, in the 
> *preStart* method.
> Now that YARN-5085 is in, Container executionType (as well as the cpu, memory 
> and any other resources) can be updated after the container has started. This 
> means we need the ability to change cgroups params after container start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7894) Improve ATS response for DS_CONTAINER when container launch fails

2018-05-07 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466278#comment-16466278
 ] 

Chandni Singh commented on YARN-7894:
-

Addressed the review comments in patch 3.
 Tried an app with 3 container failures. This is how it looks in the UI. 
{code:java}
Application Failure: desired = 3, completed = 3, allocated = 3, failed = 3, 
diagnostics = [2018-05-07 18:11:23.505]Exception from container-launch. 
Container id: container_e63_1525716591983_0001_01_03 Exit code: -1 
Exception message: Privileged container being requested but privileged 
containers are not enabled on this cluster Shell error output:  Shell 
output:  [2018-05-07 18:11:23.511]Container exited with a non-zero 
exit code -1. [2018-05-07 18:11:23.511]Container exited with a non-zero exit 
code -1. [2018-05-07 18:11:24.517]Exception from container-launch. Container 
id: container_e63_1525716591983_0001_01_04 Exit code: -1 Exception message: 
Privileged container being requested but privileged containers are not enabled 
on this cluster Shell error output:  Shell output:  
[2018-05-07 18:11:24.525]Container exited with a non-zero exit code -1. 
[2018-05-07 18:11:24.526]Container exited with a non-zero exit code -1. 
[2018-05-07 18:11:24.519]Exception from container-launch. Container id: 
container_e63_1525716591983_0001_01_02 Exit code: -1 Exception message: 
Privileged container being requested but privileged containers are not enabled 
on this cluster Shell error output:  Shell output:  
[2018-05-07 18:11:24.532]Container exited with a non-zero exit code -1. 
[2018-05-07 18:11:24.532]Container exited with a non-zero exit code -1.
{code}

> Improve ATS response for DS_CONTAINER when container launch fails
> -
>
> Key: YARN-7894
> URL: https://issues.apache.org/jira/browse/YARN-7894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Charan Hebri
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-7894.001.patch, YARN-7894.002.patch, 
> YARN-7894.003.patch
>
>
> When a distributed shell application starts running and a container launch 
> fails the web service call to the API,
> {noformat}
> http:// address>/ws/v1/timeline/DS_CONTAINER/{noformat}
> return a "Not Found". The message returned in this case should be improved to 
> signify that a container launch failed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7894) Improve ATS response for DS_CONTAINER when container launch fails

2018-05-07 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-7894:

Attachment: (was: YARN-7894.003.patch)

> Improve ATS response for DS_CONTAINER when container launch fails
> -
>
> Key: YARN-7894
> URL: https://issues.apache.org/jira/browse/YARN-7894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Charan Hebri
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-7894.001.patch, YARN-7894.002.patch
>
>
> When a distributed shell application starts running and a container launch 
> fails the web service call to the API,
> {noformat}
> http:// address>/ws/v1/timeline/DS_CONTAINER/{noformat}
> return a "Not Found". The message returned in this case should be improved to 
> signify that a container launch failed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8191) Fair scheduler: queue deletion without RM restart

2018-05-07 Thread Gergo Repas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergo Repas updated YARN-8191:
--
Attachment: YARN-8191.006.patch

> Fair scheduler: queue deletion without RM restart
> -
>
> Key: YARN-8191
> URL: https://issues.apache.org/jira/browse/YARN-8191
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: Queue Deletion in Fair Scheduler.pdf, 
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, 
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, 
> YARN-8191.006.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the 
> allocation file, or were dynamically created and are never going to be used 
> again. Queues always remain in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource 
> Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml 
> should be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that 
> point in time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8254) dynamically change log levels for YARN Jobs

2018-05-07 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-8254:
---

 Summary: dynamically change log levels for YARN Jobs
 Key: YARN-8254
 URL: https://issues.apache.org/jira/browse/YARN-8254
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


Currently the Log Levels for Daemons can be dynamically changed. It will be 
easier while debugging to have same for YARN Jobs. Client can setLogLevel to 
ApplicationMaster which can set it for all the containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8254) dynamically change log levels for YARN Jobs

2018-05-07 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-8254:

Labels: supportability  (was: )

> dynamically change log levels for YARN Jobs
> ---
>
> Key: YARN-8254
> URL: https://issues.apache.org/jira/browse/YARN-8254
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Priority: Major
>  Labels: supportability
>
> Currently the Log Levels for Daemons can be dynamically changed. It will be 
> easier while debugging to have same for YARN Jobs. Client can setLogLevel to 
> ApplicationMaster which can set it for all the containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8243) Flex down should first remove pending container requests (if any) and then kill running containers

2018-05-07 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466259#comment-16466259
 ] 

Billie Rinaldi commented on YARN-8243:
--

I think one problem here is that the ComponentInstance compareTo method is not 
implementing the sorting we would like. It seems that, based on the component 
instance ID assignment, we will always have to remove the instance with the 
highest ID first. This would not solve the general problem of removing running 
containers before pending instances, but it would solve the issue seen in the 
specific example provided in the description.

> Flex down should first remove pending container requests (if any) and then 
> kill running containers
> --
>
> Key: YARN-8243
> URL: https://issues.apache.org/jira/browse/YARN-8243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
>
> This is easy to test on a service with anti-affinity component, to simulate 
> pending container requests. It can be simulated by other means also (no 
> resource left in cluster, etc.).
> Service yarnfile used to test this -
> {code:java}
> {
>   "name": "sleeper-service",
>   "version": "1",
>   "components" :
>   [
> {
>   "name": "ping",
>   "number_of_containers": 2,
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   },
>   "launch_command": "sleep 9000",
>   "placement_policy": {
> "constraints": [
>   {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
>   "ping"
> ]
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Launch a service with the above yarnfile as below -
> {code:java}
> yarn app -launch simple-aa-1 simple_AA.json
> {code}
> Let's assume there are only 5 nodes in this cluster. Now, flex the above 
> service to 1 extra container than the number of nodes (6 in my case).
> {code:java}
> yarn app -flex simple-aa-1 -component ping 6
> {code}
> Only 5 containers will be allocated and running for simple-aa-1. At this 
> point, flex it down to 5 containers -
> {code:java}
> yarn app -flex simple-aa-1 -component ping 5
> {code}
> This is what is seen in the serviceam log at this point -
> {noformat}
> 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO  
> service.ClientAMService - Flexing component ping to 5
> 2018-05-03 20:17:38,469 [Component  dispatcher] INFO  component.Component - 
> [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5
> 2018-05-03 20:17:38,470 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_06]: Flexed down by user, destroying.
> 2018-05-03 20:17:38,473 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event.
> 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO  
> registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_06]: Deleting registry path 
> /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-06
> 2018-05-03 20:17:38,476 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
>   at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
>   at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
>   at java.lang.Thread.run(Thread.java:745)
> 2018-05-03 20:17:38,480 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>   at 
> 

[jira] [Updated] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-07 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7654:

Attachment: YARN-7654.020.patch

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-07 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466258#comment-16466258
 ] 

Eric Yang commented on YARN-7654:
-

Rebased patch 20 to based on YARN-8207 patch 007.

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7715) Update CPU and Memory cgroups params on container update as well.

2018-05-07 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7715:
-
Attachment: YARN-7715.003.patch

> Update CPU and Memory cgroups params on container update as well.
> -
>
> Key: YARN-7715
> URL: https://issues.apache.org/jira/browse/YARN-7715
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: YARN-7715.000.patch, YARN-7715.001.patch, 
> YARN-7715.002.patch, YARN-7715.003.patch
>
>
> In YARN-6673 and YARN-6674, the cgroups resource handlers update the cgroups 
> params for the containers, based on opportunistic or guaranteed, in the 
> *preStart* method.
> Now that YARN-5085 is in, Container executionType (as well as the cpu, memory 
> and any other resources) can be updated after the container has started. This 
> means we need the ability to change cgroups params after container start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >