date:20180316

[jira] [Updated] (YARN-8031) NodeManager will fail to start if cpu subsystem is already mounted

2018-03-16 Thread JayceAu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JayceAu updated YARN-8031:
--
Attachment: (was: image-2018-03-15-14-47-30-583.png)

> NodeManager will fail to start if cpu subsystem is already mounted
> --
>
> Key: YARN-8031
> URL: https://issues.apache.org/jira/browse/YARN-8031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: JayceAu
>Priority: Major
>
> if *yarn.nodemanager.linux-container-executor.cgroups.mount* is set to true 
> and cpu subsystem is not yet mounted, NodeManager will mount the cpu 
> subsystem and then create the control group whose default name is 
> *hadoop-yarn* if the mount step is successful. This procedure works well if 
> cpu subsystem is not yet mounted. However, under some situation cpu subsystem 
> is already mounted before NodeManager starts and NodeManager will fail to 
> start because of no write permission to the *hadoop-yarn* path . For example:
>  # in OS that use systemd such as centos7 will have cpu subsystem mounted by 
> default on machine startup
>  # some deamon whose start order is more precedent than NodeManager may also 
> rely on the mounted state of cpu subsystem. In our production environment, we 
> limit the cpu usage of the monitoring and control agent, which starts on 
> reboot
> In order to solve this problem, container-executor must be able to create the 
> control group *hadoop-yarn* if mounting controller is successful or this 
> controller is already mounted. Besides, if cpu subsystem is used in 
> combination with other subsystem and it's already mounted, container-executor 
> should use the latest mount point of cpu subsystem instread of the one 
> provided by NodeManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8040) [UI2] yarn new ui web-app does not respect current pathname for REST api

2018-03-16 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403261#comment-16403261
 ] 

Wangda Tan commented on YARN-8040:
--

+1, thanks [~sunilg].

> [UI2] yarn new ui web-app does not respect current pathname for REST api
> 
>
> Key: YARN-8040
> URL: https://issues.apache.org/jira/browse/YARN-8040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8040.001.patch
>
>
> When ui2 is accessed behind proxy like knox/nginx, trailing path name should 
> not be skipped. However trim of "ui2" if its there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-16 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403229#comment-16403229
 ] 

genericqa commented on YARN-8028:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 9 new + 
26 unchanged - 3 fixed = 35 total (was 29) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m  1s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 65m 
42s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
20s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}116m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914973/YARN-8028.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ceedfa9de729 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 49c747a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https:/

[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-03-16 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403199#comment-16403199
 ] 

genericqa commented on YARN-7221:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 46s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 46s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7221 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914970/YARN-7221.007.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 24c7551723e4 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 49c747a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19997/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build

[jira] [Comment Edited] (YARN-8034) Clarification on preferredHost request with relaxedLocality

2018-03-16 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403185#comment-16403185
 ] 

Konstantinos Karanasos edited comment on YARN-8034 at 3/17/18 1:22 AM:
---

Hi [~jagadish1...@gmail.com],

As [~jlowe] mentioned, this is very related to YARN-6344 for the capacity 
scheduler. What you should look at is the 
"yarn.scheduler.capacity.rack-locality-additional-delay" parameter.

Since you have only one (or very few) container requests, the current logic (if 
you let the above parameter to its default value) will lead to relaxing 
locality almost immediately. If you set that parameter to a positive value, you 
should achieve your desired behavior.


was (Author: kkaranasos):
Hi [~jagadish1...@gmail.com],

As [~jlowe] mentioned, this is very related to YARN-6344 for the capacity 
scheduler. What you should look at is the 
"yarn.scheduler.capacity.rack-locality-additional-delay" parameter.

Since you have only one (or very few) container requests, the current logic (if 
you let the above parameter to its default value) value will lead to relaxing 
locality almost immediately. If you set that parameter to a positive value, you 
should achieve your desired behavior.

> Clarification on preferredHost request with relaxedLocality
> ---
>
> Key: YARN-8034
> URL: https://issues.apache.org/jira/browse/YARN-8034
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>Priority: Major
>
> I work on Apache Samza, a stateful stream-processing framework that leverages 
> Yarn for resource management. The Samza AM requests resources on specific 
> hosts to schedule stateful jobs. We set relaxLocality = true in these 
> requests we make to Yarn. Often we have observed that we don't get containers 
> on the hosts that we requested them on and the Yarn RM returns containers on 
> arbitrary hosts. 
> Do you know what the behavior of the FairScheduler/CapacityScheduler is when 
> setting "relaxLocality = true".I did play around by setting a high value for 
> yarn.scheduler.capacity.node-locality-delay but it did not seem to matter. 
> However, when setting relaxLocality = false, we get resources on the exact 
> hosts we requested on.
> The behavior I want from Yarn is "Honor locality to the best possible extent 
> and only return a container on an arbitrary host if the requested host is 
> down". Is there a way to accomplish this?
> If you can point me to the Scheduler code, I'm happy to look at it as well. 
> For context, we have continuous scheduling enabled in our clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8034) Clarification on preferredHost request with relaxedLocality

2018-03-16 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403185#comment-16403185
 ] 

Konstantinos Karanasos commented on YARN-8034:
--

Hi [~jagadish1...@gmail.com],

As [~jlowe] mentioned, this is very related to YARN-6344 for the capacity 
scheduler. What you should look at is the 
"yarn.scheduler.capacity.rack-locality-additional-delay" parameter.

Since you have only one (or very few) container requests, the current logic (if 
you let the above parameter to its default value) value will lead to relaxing 
locality almost immediately. If you set that parameter to a positive value, you 
should achieve your desired behavior.

> Clarification on preferredHost request with relaxedLocality
> ---
>
> Key: YARN-8034
> URL: https://issues.apache.org/jira/browse/YARN-8034
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>Priority: Major
>
> I work on Apache Samza, a stateful stream-processing framework that leverages 
> Yarn for resource management. The Samza AM requests resources on specific 
> hosts to schedule stateful jobs. We set relaxLocality = true in these 
> requests we make to Yarn. Often we have observed that we don't get containers 
> on the hosts that we requested them on and the Yarn RM returns containers on 
> arbitrary hosts. 
> Do you know what the behavior of the FairScheduler/CapacityScheduler is when 
> setting "relaxLocality = true".I did play around by setting a high value for 
> yarn.scheduler.capacity.node-locality-delay but it did not seem to matter. 
> However, when setting relaxLocality = false, we get resources on the exact 
> hosts we requested on.
> The behavior I want from Yarn is "Honor locality to the best possible extent 
> and only return a container on an arbitrary host if the requested host is 
> down". Is there a way to accomplish this?
> If you can point me to the Scheduler code, I'm happy to look at it as well. 
> For context, we have continuous scheduling enabled in our clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8002) Support NOT_SELF and ALL namespace types for allocation tag

2018-03-16 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403162#comment-16403162
 ] 

Wangda Tan commented on YARN-8002:
--

+1 to latest patch, will commit it by tomorrow if no objections.

> Support NOT_SELF and ALL namespace types for allocation tag
> ---
>
> Key: YARN-8002
> URL: https://issues.apache.org/jira/browse/YARN-8002
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8002.001.patch, YARN-8002.002.patch, 
> YARN-8002.003.patch, YARN-8002.004.patch
>
>
> This is a continua task after YARN-7972, YARN-7972 adds support to specify 
> tags with namespace SELF and APP_ID, like following
>  * self/
>  * app-id//
> this task is to track the work to support 2 of remaining namespace types 
> *NOT_SELF* & *ALL* (we'll support app-label later),
>  * not-self/
>  * all/
> this will require a bit refactoring in {{AllocationTagsManager}} as it needs 
> to do some proper aggregation on tags for multiple apps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-16 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403159#comment-16403159
 ] 

Wangda Tan commented on YARN-8028:
--

Attached ver.3 patch

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8028.001.patch, YARN-8028.002.patch
>
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-16 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403157#comment-16403157
 ] 

Wangda Tan commented on YARN-8028:
--

[~sunilg], 
Thanks for review. 

1/2: updated. 
3: The queueACLsManager.checkAccess() contains RMApp which is not desired. This 
check happens before application created. I'm not sure why we added a check 
queue access API with RMApp/application field. Maybe just because developer 
wants to reuse the same method. Personally I would prefer a more clear API. So 
I use checkAccess API. If you look at implementation of LeafQueue#checkAccess, 
it uses the authorizer. To me I think we should fix this code issue in a 
separate JIRA and keep this change minimum.

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8028.001.patch, YARN-8028.002.patch
>
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-16 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403154#comment-16403154
 ] 

Wangda Tan commented on YARN-8028:
--

Thanks [~bibinchundatt] for review.

For BadRequestException/ForbiddenException, I would prefer to change to use 
status code for consistency. 
{quote} # I think we shouldnt directly log the params inputs this could cause 
*log forging*{quote}
Done.

For #2/#3, I would prefer make minimum changes in this patch only, let's move 
improvement discussion to other JIRAs you mentioned. Existing JIRA is targeted 
to get ACL for single queue, which should be relatively efficiency comparing to 
get ACLs of all queues.

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8028.001.patch, YARN-8028.002.patch
>
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-16 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8028:
-
Attachment: YARN-8028.002.patch

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8028.001.patch, YARN-8028.002.patch
>
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-16 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8028:
-
Attachment: (was: YARN-8028.002.patch)

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8028.001.patch
>
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-16 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8028:
-
Attachment: YARN-8028.002.patch

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8028.001.patch
>
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-03-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403147#comment-16403147
 ] 

Eric Yang commented on YARN-7221:
-

[~ebadger] Patch 7 fixes all of the errors mentioned above.

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7221) Add security check for privileged docker container

2018-03-16 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7221:

Attachment: YARN-7221.007.patch

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8039) Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer

2018-03-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403127#comment-16403127
 ] 

Hudson commented on YARN-8039:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13850 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13850/])
YARN-8039. Clean up log dir configuration in (yufei: rev 
49c747ab187d0650143205ba57ca19607ec4c6bd)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java


> Clean up log dir configuration in 
> TestLinuxContainerExecutorWithMocks.testStartLocalizer
> 
>
> Key: YARN-8039
> URL: https://issues.apache.org/jira/browse/YARN-8039
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Fix For: 3.1.0, 2.10.0
>
> Attachments: YARN-8039.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8039) Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer

2018-03-16 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403113#comment-16403113
 ] 

Yufei Gu edited comment on YARN-8039 at 3/16/18 11:35 PM:
--

+1. Committed to trunk and branch-2. Thanks [~miklos.szeg...@cloudera.com] for 
working on this. Thanks for the review [~snemeth].


was (Author: yufeigu):
+1. Committed to trunk.

> Clean up log dir configuration in 
> TestLinuxContainerExecutorWithMocks.testStartLocalizer
> 
>
> Key: YARN-8039
> URL: https://issues.apache.org/jira/browse/YARN-8039
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Fix For: 3.1.0, 2.10.0
>
> Attachments: YARN-8039.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8039) Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer

2018-03-16 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403113#comment-16403113
 ] 

Yufei Gu commented on YARN-8039:


+1. Committed to trunk.

> Clean up log dir configuration in 
> TestLinuxContainerExecutorWithMocks.testStartLocalizer
> 
>
> Key: YARN-8039
> URL: https://issues.apache.org/jira/browse/YARN-8039
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-8039.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8018) Yarn service: Add support for initiating service upgrade

2018-03-16 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403105#comment-16403105
 ] 

Chandni Singh commented on YARN-8018:
-

[~billie.rinaldi]
I see the following in {{ServiceClient}} during flex action:
{code:java}
ApplicationReport appReport =
yarnClient.getApplicationReport(getAppId(serviceName));
if (appReport.getYarnApplicationState() != RUNNING) {
String message =
serviceName + " is at " + appReport.getYarnApplicationState()
+ " state, flex can only be invoked when service is running";
{code}

So flex is not supported while the service is stopped.

> Yarn service: Add support for initiating service upgrade
> 
>
> Key: YARN-8018
> URL: https://issues.apache.org/jira/browse/YARN-8018
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8018.wip.patch
>
>
> Add support for initiating service upgrade which includes the following main 
> changes:
>  # Service API to initiate upgrade
>  # Persist service version on hdfs
>  # Start the upgraded version of service



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8018) Yarn service: Add support for initiating service upgrade

2018-03-16 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8018:

Attachment: (was: YARN-8018.wip.patch)

> Yarn service: Add support for initiating service upgrade
> 
>
> Key: YARN-8018
> URL: https://issues.apache.org/jira/browse/YARN-8018
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8018.wip.patch
>
>
> Add support for initiating service upgrade which includes the following main 
> changes:
>  # Service API to initiate upgrade
>  # Persist service version on hdfs
>  # Start the upgraded version of service



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8018) Yarn service: Add support for initiating service upgrade

2018-03-16 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8018:

Attachment: YARN-8018.wip.patch

> Yarn service: Add support for initiating service upgrade
> 
>
> Key: YARN-8018
> URL: https://issues.apache.org/jira/browse/YARN-8018
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8018.wip.patch
>
>
> Add support for initiating service upgrade which includes the following main 
> changes:
>  # Service API to initiate upgrade
>  # Persist service version on hdfs
>  # Start the upgraded version of service



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-03-16 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403069#comment-16403069
 ] 

Wilfred Spiegelenburg commented on YARN-7962:
-

I don't think this is a really a race condition that you need to test. 
Currently when we stop the service we do not set the flag {{isServiceStarted}}. 
So in a unit test I would just call the {{serviceStop}} and then call the 
{{applicationFinished}} from the same thread. Since the flag is not set I would 
expect that to fail without the fix and pass with the fix.

> Race Condition When Stopping DelegationTokenRenewer
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-7962.1.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2018-03-16 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402596#comment-16402596
 ] 

Devaraj K commented on YARN-5764:
-

Thanks [~miklos.szeg...@cloudera.com] for review and commit, [~leftnoteasy] and 
others for reviews.

[~miklos.szeg...@cloudera.com], is there any reason to keep this as still 
'Unresolved'?

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v10.patch, 
> YARN-5764-v11.patch, YARN-5764-v2.patch, YARN-5764-v3.patch, 
> YARN-5764-v4.patch, YARN-5764-v5.patch, YARN-5764-v6.patch, 
> YARN-5764-v7.patch, YARN-5764-v8.patch, YARN-5764-v9.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files

2018-03-16 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1151:

Attachment: [YARN-1151] [Design] Configure auxiliary services from 
HDFS-based JAR files.pdf

> Ability to configure auxiliary services from HDFS-based JAR files
> -
>
> Key: YARN-1151
> URL: https://issues.apache.org/jira/browse/YARN-1151
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.9.0
>Reporter: john lilley
>Assignee: Xuan Gong
>Priority: Major
>  Labels: auxiliary-service, yarn
> Attachments: YARN-1151.1.patch, [YARN-1151] [Design] Configure 
> auxiliary services from HDFS-based JAR files.pdf
>
>
> I would like to install an auxiliary service in Hadoop YARN without actually 
> installing files/services on every node in the system.  Discussions on the 
> user@ list indicate that this is not easily done.  The reason we want an 
> auxiliary service is that our application has some persistent-data components 
> that are not appropriate for HDFS.  In fact, they are somewhat analogous to 
> the mapper output of MapReduce's shuffle, which is what led me to 
> auxiliary-services in the first place.  It would be much easier if we could 
> just place our service's JARs in HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-16 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402510#comment-16402510
 ] 

Vrushali C commented on YARN-7581:
--

Thanks for the patch [~haibochen] ! I have some questions. 

I am trying to think about the UTF8 encoding for strings. Any reason we are 
explicitly defining convertBytesToString and convertStringToBytes instead of 
using the ones that HBase provides?  The Bytes.toString conversions assume UTF8 
enconding

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html#toString-byte:A-

Second question is about always including the info field. Say, for example, 
someone wanted to see their megabytemillis counter value, do we need to 
retrieve the info field in that case? 

In general, adding the fields in the CFs to make the filter conditions work 
correct seems fine to me. +1 on overall patch apart from the above questions. 



> HBase filters are not constructed correctly in ATSv2
> 
>
> Key: YARN-7581
> URL: https://issues.apache.org/jira/browse/YARN-7581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7581-YARN-7055.04.patch, YARN-7581.00.patch, 
> YARN-7581.01.patch, YARN-7581.02.patch, YARN-7581.03.patch, YARN-7581.04.patch
>
>
> Post YARN-7346,
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters() and 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters() 
> start to fail when hbase.profile is set to 2.0)
> *Error Message*
>  [ERROR] Failures:
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters:1266 
> expected:<2> but was:<0>
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters:1523 
> expected:<1> but was:<0>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8039) Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer

2018-03-16 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402429#comment-16402429
 ] 

Szilard Nemeth commented on YARN-8039:
--

+1 (non-binding)

> Clean up log dir configuration in 
> TestLinuxContainerExecutorWithMocks.testStartLocalizer
> 
>
> Key: YARN-8039
> URL: https://issues.apache.org/jira/browse/YARN-8039
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-8039.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2018-03-16 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402382#comment-16402382
 ] 

Vrushali C commented on YARN-7190:
--

Hi [~eddyxu]

bq. I am trying to only include bugfix and compatible improvements, with one 
exception of HDFS-12990, in 3.0.1 now.
Is it OK to you?

Yes, that sounds good to me. Thanks for your efforts on this.

thanks
Vrushali

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Major
> Fix For: YARN-5355_branch2, 3.1.0, 2.9.1
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch, YARN-7190.02.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2018-03-16 Thread Lei (Eddy) Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402316#comment-16402316
 ] 

Lei (Eddy) Xu edited comment on YARN-7190 at 3/16/18 6:40 PM:
--

Hmm, I am in the process of releasing 3.0.1,  and this JIRA comes out as 
incompatible in the release changelist, so it was the revert. I feel that it is 
completely necessary for 3.1 / 2.9, but maybe not so for a minor release like 
3.0.1. I am trying to only include bugfix and compatible improvements, with one 
exception of HDFS-12990, in 3.0.1 now.
Is it OK to you?


was (Author: eddyxu):
Hmm, I am in the process of releasing 3.0.1,  and this JIRA comes out as 
incompatible in the release changelist, so it was the revert. I feel that it is 
completely necessary for 3.1 / 2.9, but maybe not so for a minor release like 
3.0.1. I am trying to only include bugfix and compatible improvements in 3.0.1 
now.
Does it make sense?

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Major
> Fix For: YARN-5355_branch2, 3.1.0, 2.9.1
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch, YARN-7190.02.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2018-03-16 Thread Lei (Eddy) Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402316#comment-16402316
 ] 

Lei (Eddy) Xu commented on YARN-7190:
-

Hmm, I am in the process of releasing 3.0.1,  and this JIRA comes out as 
incompatible in the release changelist, so it was the revert. I feel that it is 
completely necessary for 3.1 / 2.9, but maybe not so for a minor release like 
3.0.1. I am trying to only include bugfix and compatible improvements in 3.0.1 
now.
Does it make sense?

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Major
> Fix For: YARN-5355_branch2, 3.1.0, 2.9.1
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch, YARN-7190.02.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2018-03-16 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402293#comment-16402293
 ] 

Vrushali C commented on YARN-7190:
--

Hi [~eddyxu] 

Just wanted to see what incompatibility issue was noticed. We actually would 
like to avoid having these unnecessary jars in classpath for clients, else they 
start depending on those.  cc [~haibo.chen] [~rohithsharma]

thanks
Vrushali

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Major
> Fix For: YARN-5355_branch2, 3.1.0, 2.9.1
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch, YARN-7190.02.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8039) Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer

2018-03-16 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402273#comment-16402273
 ] 

genericqa commented on YARN-8039:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 14 unchanged - 1 fixed = 14 total (was 15) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 30s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
11s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 17s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8039 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914893/YARN-8039.000.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e22d627c7381 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 154cfb2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19995/testReport/ |
| Max. process+thread count | 395 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19995/console |
| Powered by | Apache Yetus

[jira] [Commented] (YARN-8040) [UI2] yarn new ui web-app does not respect current pathname for REST api

2018-03-16 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402267#comment-16402267
 ] 

genericqa commented on YARN-8040:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
24m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 17s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8040 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914895/YARN-8040.001.patch |
| Optional Tests |  asflicense  shadedclient  |
| uname | Linux 33ef01e7ae09 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 154cfb2 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 407 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19996/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [UI2] yarn new ui web-app does not respect current pathname for REST api
> 
>
> Key: YARN-8040
> URL: https://issues.apache.org/jira/browse/YARN-8040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8040.001.patch
>
>
> When ui2 is accessed behind proxy like knox/nginx, trailing path name should 
> not be skipped. However trim of "ui2" if its there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2018-03-16 Thread Lei (Eddy) Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated YARN-7190:

Fix Version/s: (was: 3.0.1)

Hi, [~varun_saxena] I reverted this from 3.0.1 because it is incompatible to 
3.0.0.

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Major
> Fix For: YARN-5355_branch2, 3.1.0, 2.9.1
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch, YARN-7190.02.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-03-16 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402242#comment-16402242
 ] 

BELUGA BEHR commented on YARN-7962:
---

[~wilfreds] Can you please provide thoughts on how to unit test a race 
condition of this sort?  How to introduce pauses into the locked code?

Also, there technically isn't a need to lock on the initialization.  It's just 
a safety and good practice item.  There will be almost no overheard since we 
will only initialize one time (or maybe a couple) of times, so it doesn't hurt 
to be safe.

> Race Condition When Stopping DelegationTokenRenewer
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-7962.1.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8016) Refine PlacementRule interface and add a app-name queue mapping rule as an example

2018-03-16 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402224#comment-16402224
 ] 

Zian Chen commented on YARN-8016:
-

Hi [~leftnoteasy] , when I'm doing the patch refactor, I realize an issue in 
comment No.3 which suggests move getQueueMappingEntity out from 
CapacitySchedulerConfiguration. This actually has some reason here. 
 # we actually put both getQueueMappingEntity and setQueueMappingEntity inside 
CapacitySchedulerConfiguration cause these two methods are used to set 
properties for CapacitySchedulerConfiguration, not a general setter and getter 
method. If we move out of CapacitySchedulerConfiguration, like into 
QueuePlacementRuleUtils, the setting will not take effect for the conf.
 # UserGroupMappingPlacementRule is also put its getQueueMappings and 
setQueueMappings inside CapacitySchedulerConfiguration too.

Let's me put it inside CapacitySchedulerConfiguration for now and make 
everything work. Then we can discuss further if you have better idea for this, 
Thanks!

> Refine PlacementRule interface and add a app-name queue mapping rule as an 
> example
> --
>
> Key: YARN-8016
> URL: https://issues.apache.org/jira/browse/YARN-8016
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8016.001.patch
>
>
> After YARN-3635/YARN-6689, PlacementRule becomes a common interface which can 
> be used by scheduler and can be dynamically updated by scheduler according to 
> configs. There're some other works. 
> - There's no way to initialize PlacementRule.
> - No example of PlacementRule except the user-group mapping one.
> This JIRA is targeted to refine PlacementRule interfaces and add another 
> PlacementRule example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-16 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1640#comment-1640
 ] 

Sunil G commented on YARN-8028:
---

Approach looks fine.

Some minor ones.
 # Unused imports in RMWebServiceProtocol
 # In many cases in RMWebServices, when we get AccessControlException,  
FORBIDDEN is used. I think its correct when compared to BadRequest
 # Inline with Bibins thought, RMWebService api is common to all schedulers. I 
can see we use scheduler specific code in other places like below already. 
Could we reuse this like 
*queueACLsManager.checkAccess(callerUGI,QueueACL.SUBMIT_APPLICATIONS, 
application, Server.getRemoteAddress(), null, targetQueue)*
 

{code:java}
    if (scheduler instanceof CapacityScheduler) {

      CSQueue queue = ((CapacityScheduler) scheduler).getQueue(targetQueue);

      if (queue == null) {

        LOG.warn("Target queue " + targetQueue

            + " does not exist while trying to move "

            + app.getApplicationId());

        return false;

      }

      return authorizer.checkPermission(

          new AccessRequest(queue.getPrivilegedEntity(), callerUGI,

              SchedulerUtils.toAccessType(acl),

              app.getApplicationId().toString(), app.getName(),

              remoteAddress, forwardedAddresses));

    } else if (scheduler instanceof FairScheduler) {

      FSQueue queue = ((FairScheduler) scheduler).getQueueManager().

          getQueue(targetQueue);

      if (queue == null) {

        LOG.warn("Target queue " + targetQueue

            + " does not exist while trying to move "

            + app.getApplicationId());

        return false;

      }

      return scheduler.checkAccess(callerUGI, acl, targetQueue);

    }{code}

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8028.001.patch
>
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8040) [UI2] yarn new ui web-app does not respect current pathname for REST api

2018-03-16 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402211#comment-16402211
 ] 

Sunil G commented on YARN-8040:
---

Hi [~leftnoteasy]. could u pls help to review this.

I tested in normal cluster and with knox. Both works fine. Thanks.

> [UI2] yarn new ui web-app does not respect current pathname for REST api
> 
>
> Key: YARN-8040
> URL: https://issues.apache.org/jira/browse/YARN-8040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8040.001.patch
>
>
> When ui2 is accessed behind proxy like knox/nginx, trailing path name should 
> not be skipped. However trim of "ui2" if its there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8040) [UI2] yarn new ui web-app does not respect current pathname for REST api

2018-03-16 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-8040:
--
Attachment: YARN-8040.001.patch

> [UI2] yarn new ui web-app does not respect current pathname for REST api
> 
>
> Key: YARN-8040
> URL: https://issues.apache.org/jira/browse/YARN-8040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8040.001.patch
>
>
> When ui2 is accessed behind proxy like knox/nginx, trailing path name should 
> not be skipped. However trim of "ui2" if its there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8040) [UI2] yarn new ui web-app does not respect current pathname for REST api

2018-03-16 Thread Sunil G (JIRA)

Sunil G created YARN-8040:
-

 Summary: [UI2] yarn new ui web-app does not respect current 
pathname for REST api
 Key: YARN-8040
 URL: https://issues.apache.org/jira/browse/YARN-8040
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Sunil G
Assignee: Sunil G


When ui2 is accessed behind proxy like knox/nginx, trailing path name should 
not be skipped. However trim of "ui2" if its there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8039) Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer

2018-03-16 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-8039:
-
Attachment: YARN-8039.000.patch

> Clean up log dir configuration in 
> TestLinuxContainerExecutorWithMocks.testStartLocalizer
> 
>
> Key: YARN-8039
> URL: https://issues.apache.org/jira/browse/YARN-8039
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-8039.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8031) NodeManager will fail to start if cpu subsystem is already mounted

2018-03-16 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402162#comment-16402162
 ] 

Miklos Szegedi commented on YARN-8031:
--

[~jayceAu], thank you for raising this. If you have CGroups already mounted, 
you should set the mount option to false as described here:

[https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html]
{code:java}
Discover CGroups mounted alreadyThis should be used on newer systems 
like RHEL7 or Ubuntu16 or if the administrator mounts CGroups before YARN 
starts. Set yarn.nodemanager.linux-container-executor.cgroups.mount to false 
and leave other settings set to their defaults. YARN will locate the mount 
points in /proc/mounts. Common locations include /sys/fs/cgroup and /cgroup. 
The default location can vary depending on the Linux distribution in use.{code}

> NodeManager will fail to start if cpu subsystem is already mounted
> --
>
> Key: YARN-8031
> URL: https://issues.apache.org/jira/browse/YARN-8031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: JayceAu
>Priority: Major
> Attachments: image-2018-03-15-14-47-30-583.png
>
>
> if *yarn.nodemanager.linux-container-executor.cgroups.mount* is set to true 
> and cpu subsystem is not yet mounted, NodeManager will mount the cpu 
> subsystem and then create the control group whose default name is 
> *hadoop-yarn* if the mount step is successful. This procedure works well if 
> cpu subsystem is not yet mounted. However, under some situation cpu subsystem 
> is already mounted before NodeManager starts and NodeManager will fail to 
> start because of no write permission to the *hadoop-yarn* path . For example:
>  # in OS that use systemd such as centos7 will have cpu subsystem mounted by 
> default on machine startup
>  # some deamon whose start order is more precedent than NodeManager may also 
> rely on the mounted state of cpu subsystem. In our production environment, we 
> limit the cpu usage of the monitoring and control agent, which starts on 
> reboot
> In order to solve this problem, container-executor must be able to create the 
> control group *hadoop-yarn* if mounting controller is successful or this 
> controller is already mounted. Besides, if cpu subsystem is used in 
> combination with other subsystem and it's already mounted, container-executor 
> should use the latest mount point of cpu subsystem instread of the one 
> provided by NodeManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8039) Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer

2018-03-16 Thread Miklos Szegedi (JIRA)

Miklos Szegedi created YARN-8039:


 Summary: Clean up log dir configuration in 
TestLinuxContainerExecutorWithMocks.testStartLocalizer
 Key: YARN-8039
 URL: https://issues.apache.org/jira/browse/YARN-8039
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8037) CGroupsResourceCalculator excessive warnings on container relaunch

2018-03-16 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402121#comment-16402121
 ] 

Miklos Szegedi commented on YARN-8037:
--

Thank you, [~shaneku...@gmail.com] for raising this. [~haibochen], we had a 
logic in one of earlier patches in YARN-7064 to remove multiple reporting of 
issues from CGroupsResourceCalculator and we removed based on your advice to 
support debugging. What is your opinion about this suggestion? Do you think we 
should add back some filtering here? 
https://issues.apache.org/jira/browse/YARN-7064?focusedCommentId=16323135&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16323135

 

> CGroupsResourceCalculator excessive warnings on container relaunch
> --
>
> Key: YARN-8037
> URL: https://issues.apache.org/jira/browse/YARN-8037
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Major
>
> When a container is relaunched, the old process no longer exists. When using 
> the {{CGroupsResourceCalculator}} this results in the warning and exception 
> below being logged every second until the relaunch occurs, which is excessive 
> and filling up the logs.
> {code:java}
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse 12844
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_02/cpuacct.stat
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse cgroups 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.memsw.usage_in_bytes
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.usage_in_bytes
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more{code}
> We should consider moving the exception to debug to reduce the noise at a 
> minimum. Alternatively, it may make sense to stop the existing 
> {{MonitoringThread}} during relaunch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscri

[jira] [Commented] (YARN-7905) Parent directory permission incorrect during public localization

2018-03-16 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402112#comment-16402112
 ] 

genericqa commented on YARN-7905:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 22s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 6 new + 214 unchanged - 0 fixed = 220 total (was 214) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
33s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7905 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914877/YARN-7905-006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6dda1f8a75c2 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 154cfb2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/19994/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19994/testReport/ |
| Max. process+thread count | 409 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
h

[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-03-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402098#comment-16402098
 ] 

Eric Yang commented on YARN-7654:
-

[~jlowe] Yes, you are right patch 001 doesn't handle shell expansion, and 
likely to cause more problems if release schedule happened prior to having a 
chance to close all holes.  I am going to stay on course for execv version.  It 
is a bit late for me to restart the coding to fix execv problem first then 
apply entry_point after.  I think it is possible to get this done within 50k of 
code changes.

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8034) Clarification on preferredHost request with relaxedLocality

2018-03-16 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402088#comment-16402088
 ] 

Jason Lowe commented on YARN-8034:
--

{quote}I observed that the Yarn RM immediately returns a container on a 
different host in the next second after the request was made.
{quote}
I believe something like YARN-6344 is relevant here even though that fix is 
specific to the CapacittyScheduler. The schedulers have a heuristic where it 
assumes making a small number of requests relative to the size of the cluster 
should bias towards responsiveness rather than locality. It's been there a long 
time. I don't know the full history behind it, but I suspect it derives from 
assuming a small request is for a small job and interactivity is more important 
than waiting for locality (since we are allowed to relax). See 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt#getLocalityWaitFactor
 for the equivalent place in the FairScheduler for what is being discussed in 
YARN-6344.
{quote}The Samza AM can cancel and resubmit a request (either for a different 
host or with relaxLocality=true) when a node trying to be allocated becomes 
unusable.
{quote}
You will want to keep that logic even after updating the AM to monitor the node 
updates. That will cover the case where the desired node is completely full 
with long-running containers.

> Clarification on preferredHost request with relaxedLocality
> ---
>
> Key: YARN-8034
> URL: https://issues.apache.org/jira/browse/YARN-8034
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>Priority: Major
>
> I work on Apache Samza, a stateful stream-processing framework that leverages 
> Yarn for resource management. The Samza AM requests resources on specific 
> hosts to schedule stateful jobs. We set relaxLocality = true in these 
> requests we make to Yarn. Often we have observed that we don't get containers 
> on the hosts that we requested them on and the Yarn RM returns containers on 
> arbitrary hosts. 
> Do you know what the behavior of the FairScheduler/CapacityScheduler is when 
> setting "relaxLocality = true".I did play around by setting a high value for 
> yarn.scheduler.capacity.node-locality-delay but it did not seem to matter. 
> However, when setting relaxLocality = false, we get resources on the exact 
> hosts we requested on.
> The behavior I want from Yarn is "Honor locality to the best possible extent 
> and only return a container on an arbitrary host if the requested host is 
> down". Is there a way to accomplish this?
> If you can point me to the Scheduler code, I'm happy to look at it as well. 
> For context, we have continuous scheduling enabled in our clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-16 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402073#comment-16402073
 ] 

genericqa commented on YARN-7581:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} YARN-7055 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
11s{color} | {color:green} YARN-7055 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} YARN-7055 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} YARN-7055 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} YARN-7055 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} YARN-7055 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} YARN-7055 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 20s{color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client
 generated 2 new + 149 unchanged - 0 fixed = 151 total (was 149) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 52s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
23s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:c8176b7 |
| JIRA Issue | YARN-7581 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914876/YARN-7581-YARN-7055.04.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d53e82cd4041 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | YARN-7055 / 922dd07 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/19993/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservic

[jira] [Updated] (YARN-8038) Support data retention policy in YARN ATSv2

2018-03-16 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8038:
-
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-7055

> Support data retention policy in YARN ATSv2
> ---
>
> Key: YARN-8038
> URL: https://issues.apache.org/jira/browse/YARN-8038
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineservice
>Affects Versions: 3.0.0
>Reporter: Haibo Chen
>Priority: Major
>
> The data stored today in ATSv2 is either system data in that it is generated 
> by YARN, or custom data that is generated by Application Masters themselves.
> Data retention policy is necessary to maintain feature parity between the new 
> MR JHS with the current JHS.
> We may want to provide separate policies for system data and custom data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8038) Support data retention policy in YARN ATSv2

2018-03-16 Thread Haibo Chen (JIRA)

Haibo Chen created YARN-8038:


 Summary: Support data retention policy in YARN ATSv2
 Key: YARN-8038
 URL: https://issues.apache.org/jira/browse/YARN-8038
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineservice
Affects Versions: 3.0.0
Reporter: Haibo Chen


The data stored today in ATSv2 is either system data in that it is generated by 
YARN, or custom data that is generated by Application Masters themselves.

Data retention policy is necessary to maintain feature parity between the new 
MR JHS with the current JHS.

We may want to provide separate policies for system data and custom data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8034) Clarification on preferredHost request with relaxedLocality

2018-03-16 Thread Jagadish (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402040#comment-16402040
 ] 

Jagadish commented on YARN-8034:


Thank you [~jlowe] for your recommendations. Very useful. 

>> The host could be down, full of other containers, unhealthy, etc.

I did have some follow-up questions so that I understand the behavior of this 
config better. 

- I setup an experiment with the Samza AM to request 1 container (with 1G 
memory and 1 vcore, host = our-preferred-host, relaxLocality = true and rack = 
null.)
- I observed that the Yarn RM immediately returns a container on a different 
host in the next second after the request was made.
- I am able to repro' this 100% of the time across multiple runs (and across 
multiple hosts in our cluster) which makes me wonder if the preferredHost is 
ignored if we set relaxLocality = true? I did verify that all nodes were 
healthy in the cluster. 
- For more context, I'm able to observe this behavior with both the 
fair-scheduler and the capacity-scheduler with "continuous scheduling" enabled 
in both cases. So, I'm not sure if that matters. 

FWIW, with relaxLocality = false, the RM returns containers on the exact hosts 
that we requested them on. I'm happy to submit a documentation patch to Hadoop 
so that we make everyone's life better when using Yarn for stateful apps :-) 

>> The node locality delay gives admins some control over how patiently the RM 
>> will wait for locality.

Our node locality delay is configured to the number of nodes in the cluster. I 
did try increasing it to an arbitrary high number and it did not seem to affect 
the results of the above experiment. Are there other knobs at play I'm missing?

>> Yes, although it will require some work on the Samza AM's part. Samza's AM 
>> can make requests for specific nodes with relaxLocality=false, but it also 
>> should monitor the updatedNodes field of each AllocateResponse. The RM will 
>> notify applications in that response when a node becomes unusable or becomes 
>> usable again. The Samza AM can cancel and resubmit a request (either for a 
>> different host or with relaxLocality=true) when a node trying to be 
>> allocated becomes unusable.

Our original approach was for the Samza AM re-submit the request with 
relaxedLocality = true after waiting for some timeout. Thank you for your 
helpful recommendations.  

> Clarification on preferredHost request with relaxedLocality
> ---
>
> Key: YARN-8034
> URL: https://issues.apache.org/jira/browse/YARN-8034
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>Priority: Major
>
> I work on Apache Samza, a stateful stream-processing framework that leverages 
> Yarn for resource management. The Samza AM requests resources on specific 
> hosts to schedule stateful jobs. We set relaxLocality = true in these 
> requests we make to Yarn. Often we have observed that we don't get containers 
> on the hosts that we requested them on and the Yarn RM returns containers on 
> arbitrary hosts. 
> Do you know what the behavior of the FairScheduler/CapacityScheduler is when 
> setting "relaxLocality = true".I did play around by setting a high value for 
> yarn.scheduler.capacity.node-locality-delay but it did not seem to matter. 
> However, when setting relaxLocality = false, we get resources on the exact 
> hosts we requested on.
> The behavior I want from Yarn is "Honor locality to the best possible extent 
> and only return a container on an arbitrary host if the requested host is 
> down". Is there a way to accomplish this?
> If you can point me to the Scheduler code, I'm happy to look at it as well. 
> For context, we have continuous scheduling enabled in our clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value

2018-03-16 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402037#comment-16402037
 ] 

Sunil G commented on YARN-7461:
---

When multiple resource types are considered, then its possible that one such 
resource may have 0 in numerator or in denominator. I can see that we are 
skipping such entries. But my concern is that such a scenario may not compute 
accurate ratio and could impact higher level api uses like DRC#lessThanOrEquals 
which internally calls a RC#compare and eventually comes to ratio. Ideally we 
want to see whether resourceA is less than resourceB and in such case its 
possible that resourceA may have 5 as resource value for resource type X and 
resourceB may have 0 for same type X.

So my worry is that how it will affect in a computation where resources are 
having values corresponding each resource types heterogeneously. 

> DominantResourceCalculator#ratio calculation problem when right resource 
> contains zero value
> 
>
> Key: YARN-7461
> URL: https://issues.apache.org/jira/browse/YARN-7461
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-7461.001.patch, YARN-7461.002.patch, 
> YARN-7461.003.patch, YARN-7461.004.patch
>
>
> Currently DominantResourceCalculator#ratio may return wrong result when right 
> resource contains zero value. For example, there are three resource types 
> such as , leftResource=<5, 5, 0> and 
> rightResource=<10, 10, 0>, we expect the result of 
> DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but 
> currently is NaN.
> There should be a verification before divide calculation to ensure that 
> dividend is not zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7905) Parent directory permission incorrect during public localization

2018-03-16 Thread Bilwa S T (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402020#comment-16402020
 ] 

Bilwa S T commented on YARN-7905:
-

[~bibinchundatt] I have taken care of comment given above. Please review

> Parent directory permission incorrect during public localization 
> -
>
> Key: YARN-7905
> URL: https://issues.apache.org/jira/browse/YARN-7905
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-7905-001.patch, YARN-7905-002.patch, 
> YARN-7905-003.patch, YARN-7905-004.patch, YARN-7905-005.patch, 
> YARN-7905-006.patch
>
>
> Similar to YARN-6708 during public localization also we have to take care for 
> parent directory if the umask is 027 during node manager start up.
> /filecache/0/200
> the directory permission of /filecache/0 is 750. Which cause 
> application failure 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7905) Parent directory permission incorrect during public localization

2018-03-16 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-7905:

Attachment: YARN-7905-006.patch

> Parent directory permission incorrect during public localization 
> -
>
> Key: YARN-7905
> URL: https://issues.apache.org/jira/browse/YARN-7905
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-7905-001.patch, YARN-7905-002.patch, 
> YARN-7905-003.patch, YARN-7905-004.patch, YARN-7905-005.patch, 
> YARN-7905-006.patch
>
>
> Similar to YARN-6708 during public localization also we have to take care for 
> parent directory if the umask is 027 during node manager start up.
> /filecache/0/200
> the directory permission of /filecache/0 is 750. Which cause 
> application failure 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-16 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402005#comment-16402005
 ] 

Billie Rinaldi commented on YARN-7973:
--

Sounds good. Thanks for looking into this, [~shaneku...@gmail.com].

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8037) CGroupsResourceCalculator excessive warnings on container relaunch

2018-03-16 Thread Shane Kumpf (JIRA)

Shane Kumpf created YARN-8037:
-

 Summary: CGroupsResourceCalculator excessive warnings on container 
relaunch
 Key: YARN-8037
 URL: https://issues.apache.org/jira/browse/YARN-8037
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Shane Kumpf


When a container is relaunched, the old process no longer exists. When using 
the {{CGroupsResourceCalculator}} this results in the warning and exception 
below being logged every second until the relaunch occurs, which is excessive 
and filling up the logs.
{code:java}
2018-03-16 14:30:33,438 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
 Failed to parse 12844
org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
interim 12844
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
Caused by: java.io.FileNotFoundException: 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_02/cpuacct.stat
 (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
... 4 more
2018-03-16 14:30:33,438 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
 Failed to parse cgroups 
/sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.memsw.usage_in_bytes
org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
interim 12844
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
Caused by: java.io.FileNotFoundException: 
/sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.usage_in_bytes
 (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
... 4 more{code}
We should consider moving the exception to debug to reduce the noise at a 
minimum. Alternatively, it may make sense to stop the existing 
{{MonitoringThread}} during relaunch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7905) Parent directory permission incorrect during public localization

2018-03-16 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-7905:

Attachment: (was: YARN-7905-006.patch)

> Parent directory permission incorrect during public localization 
> -
>
> Key: YARN-7905
> URL: https://issues.apache.org/jira/browse/YARN-7905
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-7905-001.patch, YARN-7905-002.patch, 
> YARN-7905-003.patch, YARN-7905-004.patch, YARN-7905-005.patch
>
>
> Similar to YARN-6708 during public localization also we have to take care for 
> parent directory if the umask is 027 during node manager start up.
> /filecache/0/200
> the directory permission of /filecache/0 is 750. Which cause 
> application failure 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-16 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-7581:
-
Attachment: YARN-7581-YARN-7055.04.patch

> HBase filters are not constructed correctly in ATSv2
> 
>
> Key: YARN-7581
> URL: https://issues.apache.org/jira/browse/YARN-7581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7581-YARN-7055.04.patch, YARN-7581.00.patch, 
> YARN-7581.01.patch, YARN-7581.02.patch, YARN-7581.03.patch, YARN-7581.04.patch
>
>
> Post YARN-7346,
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters() and 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters() 
> start to fail when hbase.profile is set to 2.0)
> *Error Message*
>  [ERROR] Failures:
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters:1266 
> expected:<2> but was:<0>
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters:1523 
> expected:<1> but was:<0>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7905) Parent directory permission incorrect during public localization

2018-03-16 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-7905:

Attachment: YARN-7905-006.patch

> Parent directory permission incorrect during public localization 
> -
>
> Key: YARN-7905
> URL: https://issues.apache.org/jira/browse/YARN-7905
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-7905-001.patch, YARN-7905-002.patch, 
> YARN-7905-003.patch, YARN-7905-004.patch, YARN-7905-005.patch, 
> YARN-7905-006.patch
>
>
> Similar to YARN-6708 during public localization also we have to take care for 
> parent directory if the umask is 027 during node manager start up.
> /filecache/0/200
> the directory permission of /filecache/0 is 750. Which cause 
> application failure 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8006) Make Hbase-2 profile as default for YARN-7055 branch

2018-03-16 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401977#comment-16401977
 ] 

Haibo Chen commented on YARN-8006:
--

Will do shortly.

> Make Hbase-2 profile as default for YARN-7055 branch
> 
>
> Key: YARN-8006
> URL: https://issues.apache.org/jira/browse/YARN-8006
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Fix For: yarn-7055
>
> Attachments: YARN-8006-YARN-7055.001.patch, yetus-run.tar.gz
>
>
> In last weekly call folks discussed that we should have separate branch with 
> hbase-2 as profile by default. Trunk default profile is hbase-1 which runs 
> all the tests under hbase-1 profile. But for hbase-2 profile tests are not 
> running.
> As per the discussion, lets keep YARN-7055 branch for hbase-2 profile as 
> default. Any server side patches can be given to this branch as well which 
> runs tests for hbase-2 profile. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-03-16 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401973#comment-16401973
 ] 

Jason Lowe commented on YARN-7654:
--

bq. As you can see the struggle with flipping code for execv, this is the 
reason that this patch takes a long time to develop. If we want to separate 
execv call from getting a working version, then patch 001 would be reasonable 
to commit.

Patch 001 isn't reasonable to commit because it's executing untrusted shell 
constructs as root.  I agree that the execv work makes implementing the entry 
point feature easier to secure.  What I'm advocating is separating the execv 
work into a separate JIRA, since it can stand on its own and has an additional 
security benefit even without the entry point feature.  This feature can depend 
upon the execv JIRA.  That makes the overall work easier to review since it 
doesn't come in as one big patch.


> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8034) Clarification on preferredHost request with relaxedLocality

2018-03-16 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401950#comment-16401950
 ] 

Jason Lowe commented on YARN-8034:
--

If you need a specific host then set relaxLocality=false.  Otherwise there's no 
guarantee the request will be assigned to the requested host.  The host could 
be down, full of other containers, unhealthy, etc.  When relaxLocality=true 
then the RM assumes the application would prefer a container in a somewhat 
timely manner somewhere else rather than waiting indefinitely for a full node 
to free up space.  The node locality delay gives admins some control over how 
patiently the RM will wait for locality.

bq. The behavior I want from Yarn is "Honor locality to the best possible 
extent and only return a container on an arbitrary host if the requested host 
is down". Is there a way to accomplish this?

Yes, although it will require some work on the Samza AM's part.  Samza's AM can 
make requests for specific nodes with relaxLocality=false, but it also should 
monitor the updatedNodes field of each AllocateResponse.  The RM will notify 
applications in that response when a node becomes unusable or becomes usable 
again.  The Samza AM can cancel and resubmit a request (either for a different 
host or with relaxLocality=true) when a node trying to be allocated becomes 
unusable.


> Clarification on preferredHost request with relaxedLocality
> ---
>
> Key: YARN-8034
> URL: https://issues.apache.org/jira/browse/YARN-8034
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>Priority: Major
>
> I work on Apache Samza, a stateful stream-processing framework that leverages 
> Yarn for resource management. The Samza AM requests resources on specific 
> hosts to schedule stateful jobs. We set relaxLocality = true in these 
> requests we make to Yarn. Often we have observed that we don't get containers 
> on the hosts that we requested them on and the Yarn RM returns containers on 
> arbitrary hosts. 
> Do you know what the behavior of the FairScheduler/CapacityScheduler is when 
> setting "relaxLocality = true".I did play around by setting a high value for 
> yarn.scheduler.capacity.node-locality-delay but it did not seem to matter. 
> However, when setting relaxLocality = false, we get resources on the exact 
> hosts we requested on.
> The behavior I want from Yarn is "Honor locality to the best possible extent 
> and only return a container on an arbitrary host if the requested host is 
> down". Is there a way to accomplish this?
> If you can point me to the Scheduler code, I'm happy to look at it as well. 
> For context, we have continuous scheduling enabled in our clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8036) Memory Available shows a negative value after running updateNodeResource

2018-03-16 Thread Charan Hebri (JIRA)

Charan Hebri created YARN-8036:
--

 Summary: Memory Available shows a negative value after running 
updateNodeResource
 Key: YARN-8036
 URL: https://issues.apache.org/jira/browse/YARN-8036
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Charan Hebri
 Attachments: Memory_Available.jpg

Running updateNodeResource for a node that already has applications running on 
it doesn't update Memory Available with the right values. It may end up showing 
negative values based on the requirements of the application. Attached a 
screenshot for reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value

2018-03-16 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401820#comment-16401820
 ] 

genericqa commented on YARN-7461:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 9s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 22 unchanged - 2 fixed = 22 total (was 24) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
19s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 65m  
5s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}135m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7461 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914845/YARN-7461.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6c4beaa3f834 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 21c6661 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreComm

[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-16 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401818#comment-16401818
 ] 

Shane Kumpf commented on YARN-7973:
---

[~billie.rinaldi] - I looked into the issue you reported. The behavior you see 
occurs with or without this patch.

What you see above repeated over and over is the Diagnostics field being 
returned during the ContainerStatus calls. Pulling out only the Diagnostics 
field from above you get:
{code:java}
Diagnostics: [2018-03-08 22:02:53.397]Exception from container-launch.
Container id: container_1520546307703_0001_01_02
Exit code: -1
Exception message: 
Shell output: 

[2018-03-08 22:02:53.500]Diagnostic message from attempt 0 : [2018-03-08 
22:02:53.500]
[2018-03-08 22:02:53.501]Container exited with a non-zero exit code -1.
,{code}
You will see this repeated once per second until the relaunch occurs again (30 
seconds by default with native services). Once the relaunch occurs, you will 
see the exception that the relaunch failed, as the container isn't in a 
startable state. I could be convinced to call launchContainer in this case to 
produce the original error if you feel that is most appropriate, but I think 
there are alternative improvements to make here:
 * The logs are hard to follow with the diagnostics embedded in the log entry 
when returning the ContainerStatus. It looks like exceptions are repeated over 
and over, as you saw. We should consider moving this to debug logging.
 * Populate diagnostics with a better error in this case. The 
{{ContainerExecutionExecption}} thrown as part of this ACL check does not 
become part of the Diagnostics field.
 * Native Services currently uses {{ContainerRetryPolicy.RETRY_ON_ALL_ERRORS}} 
which may be too broad. -1 exit codes should likely be hard fails.

I'll open issues on these if that sounds good?

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8035) Uncaught exception in ContainersMonitorImpl during relaunch due to the process ID changing

2018-03-16 Thread Shane Kumpf (JIRA)

Shane Kumpf created YARN-8035:
-

 Summary: Uncaught exception in ContainersMonitorImpl during 
relaunch due to the process ID changing
 Key: YARN-8035
 URL: https://issues.apache.org/jira/browse/YARN-8035
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Shane Kumpf


In the case of a container relaunch event, the container ID is reused but a new 
process is spawned. For resource monitoring, {{ContainersMonitorImpl}} will 
obtain the new PID post relaunch and initialize the process tree monitoring. As 
part of this initialization, a tag called {{ContainerPid}}, whose value is the 
PID for the container, is populated for the metrics associated with the 
container. If the prior container failed after its process started, the 
original PID will already be populated for the container, resulting in the 
{{MetricsException}} below.
{code:java}
2018-03-16 11:59:02,563 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Uncaught exception in ContainersMonitorImpl while monitoring resource of 
container_1521201379995_0001_01_02
org.apache.hadoop.metrics2.MetricsException: Tag ContainerPid already exists!
at 
org.apache.hadoop.metrics2.lib.MetricsRegistry.checkTagName(MetricsRegistry.java:433)
at org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:394)
at org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:400)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.recordProcessId(ContainerMetrics.java:277)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.initializeProcessTrees(ContainersMonitorImpl.java:559)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:448){code}
{{MetricsRegistry}} provides a {{tag}} method that allows for updating the 
value of an existing tag. Updating the value ensures that the PID associated 
with container is the currently running process, which appears to be an 
appropriate fix. However, it's unclear how this tag might be being used by 
other systems. I'm not finding any usage in Hadoop itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7636) Re-reservation count may overflow when cluster resource exhausted for a long time

2018-03-16 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401780#comment-16401780
 ] 

Weiwei Yang commented on YARN-7636:
---

Thanks [~Tao Yang] for the contribution, thanks for [~leftnoteasy]'s review. 
Ihave committed this to trunk, cherry-picked to branch-2.9, branch-3.0 and 
branch-3.1.

> Re-reservation count may overflow when cluster resource exhausted for a long 
> time 
> --
>
> Key: YARN-7636
> URL: https://issues.apache.org/jira/browse/YARN-7636
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.0, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.1.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: YARN-7636.001.patch, YARN-7636.002.patch, 
> YARN-7636.003.patch
>
>
> This happens on our production cluster twice, when a request cannot be 
> satisfied for a long time, it continually triggers the re-reservation and 
> eventually caused the overflow. This will crash the scheduler.
> Exception stack:
> {noformat}
> java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count 
> of 2147483647
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246)
>         at 
> com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80)
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Refer to handling of SchedulerApplicationAttempt#addSchedulingOpportunity, we 
> can ignore this exception to avoid this problem.
> This problem may happens in 
> SchedulerApplicationAttempt#addMissedNonPartitionedRequestSchedulingOpportunity,
>  fix it in the same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7636) Re-reservation count may overflow when cluster resource exhausted for a long time

2018-03-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401777#comment-16401777
 ] 

Hudson commented on YARN-7636:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13848 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13848/])
YARN-7636. Re-reservation count may overflow when cluster resource (wwei: rev 
154cfb2b620002a7d3b7fdbf8b68236c432771e1)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java


> Re-reservation count may overflow when cluster resource exhausted for a long 
> time 
> --
>
> Key: YARN-7636
> URL: https://issues.apache.org/jira/browse/YARN-7636
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.0, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.1.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: YARN-7636.001.patch, YARN-7636.002.patch, 
> YARN-7636.003.patch
>
>
> This happens on our production cluster twice, when a request cannot be 
> satisfied for a long time, it continually triggers the re-reservation and 
> eventually caused the overflow. This will crash the scheduler.
> Exception stack:
> {noformat}
> java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count 
> of 2147483647
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246)
>         at 
> com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80)
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Refer to handling of SchedulerApplicationAttempt#addSchedulingOpportunity, we 
> can ignore this exception to avoid this problem.
> This problem may happens in 
> SchedulerApplicationAttempt#addMissedNonPartitionedRequestSchedulingOpportunity,
>  fix it in the same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7636) Re-reservation count may overflow when cluster resource exhausted for a long time

2018-03-16 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7636:
--
Fix Version/s: 3.1.0

> Re-reservation count may overflow when cluster resource exhausted for a long 
> time 
> --
>
> Key: YARN-7636
> URL: https://issues.apache.org/jira/browse/YARN-7636
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.0, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.1.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: YARN-7636.001.patch, YARN-7636.002.patch, 
> YARN-7636.003.patch
>
>
> This happens on our production cluster twice, when a request cannot be 
> satisfied for a long time, it continually triggers the re-reservation and 
> eventually caused the overflow. This will crash the scheduler.
> Exception stack:
> {noformat}
> java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count 
> of 2147483647
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246)
>         at 
> com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80)
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Refer to handling of SchedulerApplicationAttempt#addSchedulingOpportunity, we 
> can ignore this exception to avoid this problem.
> This problem may happens in 
> SchedulerApplicationAttempt#addMissedNonPartitionedRequestSchedulingOpportunity,
>  fix it in the same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7636) Re-reservation count may overflow when cluster resource exhausted for a long time

2018-03-16 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7636:
--
Fix Version/s: (was: 3.1.0)

> Re-reservation count may overflow when cluster resource exhausted for a long 
> time 
> --
>
> Key: YARN-7636
> URL: https://issues.apache.org/jira/browse/YARN-7636
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.0, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.1.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: YARN-7636.001.patch, YARN-7636.002.patch, 
> YARN-7636.003.patch
>
>
> This happens on our production cluster twice, when a request cannot be 
> satisfied for a long time, it continually triggers the re-reservation and 
> eventually caused the overflow. This will crash the scheduler.
> Exception stack:
> {noformat}
> java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count 
> of 2147483647
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246)
>         at 
> com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80)
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Refer to handling of SchedulerApplicationAttempt#addSchedulingOpportunity, we 
> can ignore this exception to avoid this problem.
> This problem may happens in 
> SchedulerApplicationAttempt#addMissedNonPartitionedRequestSchedulingOpportunity,
>  fix it in the same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7636) Re-reservation count may overflow when cluster resource exhausted for a long time

2018-03-16 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7636:
--
Fix Version/s: 2.9.1

> Re-reservation count may overflow when cluster resource exhausted for a long 
> time 
> --
>
> Key: YARN-7636
> URL: https://issues.apache.org/jira/browse/YARN-7636
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.0, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.1.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: YARN-7636.001.patch, YARN-7636.002.patch, 
> YARN-7636.003.patch
>
>
> This happens on our production cluster twice, when a request cannot be 
> satisfied for a long time, it continually triggers the re-reservation and 
> eventually caused the overflow. This will crash the scheduler.
> Exception stack:
> {noformat}
> java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count 
> of 2147483647
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246)
>         at 
> com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80)
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Refer to handling of SchedulerApplicationAttempt#addSchedulingOpportunity, we 
> can ignore this exception to avoid this problem.
> This problem may happens in 
> SchedulerApplicationAttempt#addMissedNonPartitionedRequestSchedulingOpportunity,
>  fix it in the same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-16 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401761#comment-16401761
 ] 

Rohith Sharma K S commented on YARN-7581:
-

+1 for latest patch. 

> HBase filters are not constructed correctly in ATSv2
> 
>
> Key: YARN-7581
> URL: https://issues.apache.org/jira/browse/YARN-7581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7581.00.patch, YARN-7581.01.patch, 
> YARN-7581.02.patch, YARN-7581.03.patch, YARN-7581.04.patch
>
>
> Post YARN-7346,
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters() and 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters() 
> start to fail when hbase.profile is set to 2.0)
> *Error Message*
>  [ERROR] Failures:
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters:1266 
> expected:<2> but was:<0>
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters:1523 
> expected:<1> but was:<0>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7636) Re-reservation count may overflow when cluster resource exhausted for a long time

2018-03-16 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7636:
--
Fix Version/s: 3.0.2

> Re-reservation count may overflow when cluster resource exhausted for a long 
> time 
> --
>
> Key: YARN-7636
> URL: https://issues.apache.org/jira/browse/YARN-7636
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.0, 2.9.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.1.0, 3.0.2, 3.2.0
>
> Attachments: YARN-7636.001.patch, YARN-7636.002.patch, 
> YARN-7636.003.patch
>
>
> This happens on our production cluster twice, when a request cannot be 
> satisfied for a long time, it continually triggers the re-reservation and 
> eventually caused the overflow. This will crash the scheduler.
> Exception stack:
> {noformat}
> java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count 
> of 2147483647
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246)
>         at 
> com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80)
>         at 
> com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Refer to handling of SchedulerApplicationAttempt#addSchedulingOpportunity, we 
> can ignore this exception to avoid this problem.
> This problem may happens in 
> SchedulerApplicationAttempt#addMissedNonPartitionedRequestSchedulingOpportunity,
>  fix it in the same way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value

2018-03-16 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-7461:
---
Attachment: YARN-7461.004.patch

> DominantResourceCalculator#ratio calculation problem when right resource 
> contains zero value
> 
>
> Key: YARN-7461
> URL: https://issues.apache.org/jira/browse/YARN-7461
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-7461.001.patch, YARN-7461.002.patch, 
> YARN-7461.003.patch, YARN-7461.004.patch
>
>
> Currently DominantResourceCalculator#ratio may return wrong result when right 
> resource contains zero value. For example, there are three resource types 
> such as , leftResource=<5, 5, 0> and 
> rightResource=<10, 10, 0>, we expect the result of 
> DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but 
> currently is NaN.
> There should be a verification before divide calculation to ensure that 
> dividend is not zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value

2018-03-16 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401711#comment-16401711
 ] 

Tao Yang commented on YARN-7461:


Updated v4 patch to run RM test cases through fixing check-style errors in 
TestContainerAllocation.

Thanks [~cheersyang] and [~leftnoteasy] for your reviews and suggestions.

This patch won't change the common behavior of ratio(a,b), it can get >1 when 
a>b and get <1 when a, b=<1,1,0>, 
result=2.0 (result before this patch is NaN). Other cases are all following the 
old behavior.

YARN-8020/YARN-6538 seems irrelevant to this issue and even the ratio 
calculation. 

> DominantResourceCalculator#ratio calculation problem when right resource 
> contains zero value
> 
>
> Key: YARN-7461
> URL: https://issues.apache.org/jira/browse/YARN-7461
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-7461.001.patch, YARN-7461.002.patch, 
> YARN-7461.003.patch, YARN-7461.004.patch
>
>
> Currently DominantResourceCalculator#ratio may return wrong result when right 
> resource contains zero value. For example, there are three resource types 
> such as , leftResource=<5, 5, 0> and 
> rightResource=<10, 10, 0>, we expect the result of 
> DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but 
> currently is NaN.
> There should be a verification before divide calculation to ensure that 
> dividend is not zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-16 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401663#comment-16401663
 ] 

Bibin A Chundatt edited comment on YARN-8028 at 3/16/18 9:31 AM:
-

[~leftnoteasy]
{code:java}
2544  return Response.status(Status.BAD_REQUEST).entity(
2545  "Specified queueAclType=" + queueAclType
2546  + " is not a valid type, valid queue acl types={"
2547  + "SUBMIT_APPLICATIONS/ADMINISTER_QUEUE}").build();
{code}
 # Can we use {{BadRequestException}}
{code:java}
2568  return Response.status(Status.FORBIDDEN).entity(
2569  "User=" + username + " doesn't have access to queue=" + queue
2570  + " with acl-type=" + queueAclType).build();
{code}

 # {{ForbiddenException}} can be used
{code:java}
2535  LOG.debug("Check user=" + username + " has access to queue=" + 
queue
2536  + " ACL_TYPE=" + queueAclType);
{code}

 # I think we shouldnt directly log the params inputs this could cause *log 
forging*
 # Thoughts on allowing all queue rights similar to {{getQueueUserAcls}} this 
would allow in different services to cache acl. In addition we should have 
notification framework when queue is refreshed.
 # One improvement could be  instead of querying scheduler we could use 
{{YarnAuthorizationProvider}} so that we don't lock scheduler YARN-6727. 
thoughts??


was (Author: bibinchundatt):
[~leftnoteasy]

{code}
2544  return Response.status(Status.BAD_REQUEST).entity(
2545  "Specified queueAclType=" + queueAclType
2546  + " is not a valid type, valid queue acl types={"
2547  + "SUBMIT_APPLICATIONS/ADMINISTER_QUEUE}").build();
{code}
# Can we use {{BadRequestException}}
{code}
2568  return Response.status(Status.FORBIDDEN).entity(
2569  "User=" + username + " doesn't have access to queue=" + queue
2570  + " with acl-type=" + queueAclType).build();
{code}
# {{ForbiddenException}} can be used
{code}
2535  LOG.debug("Check user=" + username + " has access to queue=" + 
queue
2536  + " ACL_TYPE=" + queueAclType);
{code}
# I think we shouldnt directly log the params inputs this could cause *log 
forging*
# Thoughts on allowing all queue rights similar to {{getQueueUserAcls}} this 
would allow in different services to cache acl. In addition we should have 
notification framework when queue is refreshed.
# One improvement could be instead be instead of querying scheduler we could 
use {{YarnAuthorizationProvider}} so that we don't lock scheduler YARN-6727. 
thoughts??

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8028.001.patch
>
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-16 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401663#comment-16401663
 ] 

Bibin A Chundatt commented on YARN-8028:


[~leftnoteasy]

{code}
2544  return Response.status(Status.BAD_REQUEST).entity(
2545  "Specified queueAclType=" + queueAclType
2546  + " is not a valid type, valid queue acl types={"
2547  + "SUBMIT_APPLICATIONS/ADMINISTER_QUEUE}").build();
{code}
# Can we use {{BadRequestException}}
{code}
2568  return Response.status(Status.FORBIDDEN).entity(
2569  "User=" + username + " doesn't have access to queue=" + queue
2570  + " with acl-type=" + queueAclType).build();
{code}
# {{ForbiddenException}} can be used
{code}
2535  LOG.debug("Check user=" + username + " has access to queue=" + 
queue
2536  + " ACL_TYPE=" + queueAclType);
{code}
# I think we shouldnt directly log the params inputs this could cause *log 
forging*
# Thoughts on allowing all queue rights similar to {{getQueueUserAcls}} this 
would allow in different services to cache acl. In addition we should have 
notification framework when queue is refreshed.
# One improvement could be instead be instead of querying scheduler we could 
use {{YarnAuthorizationProvider}} so that we don't lock scheduler YARN-6727. 
thoughts??

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8028.001.patch
>
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

76 matches

Mail list logo