[jira] [Commented] (YARN-9834) Allow using a pool of local users to run Yarn Secure Container in secure mode

2019-09-13 Thread Ashvin (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929676#comment-16929676
 ] 

Ashvin commented on YARN-9834:
--

Hi [~shanyu]. I took a quick look at the pull request. Is it missing some 
files, SecureModeLocalUserAllocator.java.

> Allow using a pool of local users to run Yarn Secure Container in secure mode
> -
>
> Key: YARN-9834
> URL: https://issues.apache.org/jira/browse/YARN-9834
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: shanyu zhao
>Priority: Major
>
> Yarn Secure Container in secure mode allows separation of different user's 
> local files and container processes running on the same node manager. This 
> depends on an out of band service such as SSSD/Winbind to sync all domain 
> users to local machine.
> SSSD/Winbind user sync has lots of overhead, especially for large 
> corporations. Also if running Yarn inside Kubernetes cluster (meaning node 
> managers running inside Docker container), it doesn't make sense for each 
> container to sync a whole copy of domain users.
> We should allow a new configuration to Yarn, such that we can pre-create a 
> pool of users on each machine/Docker container. And at runtime, Yarn 
> allocates a local user to the domain user that submits the application. When 
> all containers of that user and all files belonging to that user are deleted, 
> we can release the allocation and allow other users to use the same local 
> user to run their Yarn containers.
> We propose to add these new configurations:
> {code:java}
> yarn.nodemanager.linux-container-executor.secure-mode.use-local-user, 
> defaults to false
> yarn.nodemanager.linux-container-executor.secure-mode.local-user-prefix, 
> defaults to "user"{code}
> If we enable this feature, with local-user-prefix set to "user", then we 
> expect there are pre-created local users user0 - usern, where n equals to:
> {code:java}
> yarn.nodemanager.resource.cpu-vcores {code}
> We can use an in-memory allocator to keep the domain user to local user 
> mapping.
> Limitations:
> 1) This feature does not support PRIVATE type of resource allocation. Because 
> PRIVATE type of resources are potentially cached in the node manager for a 
> very long time, supporting it will be a security problem that a user might be 
> able to peek into previous user's PRIVATE resources. We can modify code to 
> treat all PRIVATE type of resource as APPLICATION.
> 2) It is recommended to enable DominantResourceCalculator so that no more 
> than "cpu-vcores" number of concurrent containers running on a node manager:
> {code:java}
> yarn.scheduler.capacity.resource-calculator
> = org.apache.hadoop.yarn.util.resource.DominantResourceCalculator {code}
> 3) Currently this feature does not work with Yarn Node Manager recovery. We 
> may add recovery support in the future when we hook up with the right calls 
> in the recovery flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9834) Allow using a pool of local users to run Yarn Secure Container in secure mode

2019-09-13 Thread shanyu zhao (Jira)
shanyu zhao created YARN-9834:
-

 Summary: Allow using a pool of local users to run Yarn Secure 
Container in secure mode
 Key: YARN-9834
 URL: https://issues.apache.org/jira/browse/YARN-9834
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.1.2
Reporter: shanyu zhao


Yarn Secure Container in secure mode allows separation of different user's 
local files and container processes running on the same node manager. This 
depends on an out of band service such as SSSD/Winbind to sync all domain users 
to local machine.

SSSD/Winbind user sync has lots of overhead, especially for large corporations. 
Also if running Yarn inside Kubernetes cluster (meaning node managers running 
inside Docker container), it doesn't make sense for each container to sync a 
whole copy of domain users.

We should allow a new configuration to Yarn, such that we can pre-create a pool 
of users on each machine/Docker container. And at runtime, Yarn allocates a 
local user to the domain user that submits the application. When all containers 
of that user and all files belonging to that user are deleted, we can release 
the allocation and allow other users to use the same local user to run their 
Yarn containers.

We propose to add these new configurations:
{code:java}
yarn.nodemanager.linux-container-executor.secure-mode.use-local-user, defaults 
to false
yarn.nodemanager.linux-container-executor.secure-mode.local-user-prefix, 
defaults to "user"{code}
If we enable this feature, with local-user-prefix set to "user", then we expect 
there are pre-created local users user0 - usern, where n equals to:
{code:java}
yarn.nodemanager.resource.cpu-vcores {code}
We can use an in-memory allocator to keep the domain user to local user mapping.

Limitations:

1) This feature does not support PRIVATE type of resource allocation. Because 
PRIVATE type of resources are potentially cached in the node manager for a very 
long time, supporting it will be a security problem that a user might be able 
to peek into previous user's PRIVATE resources. We can modify code to treat all 
PRIVATE type of resource as APPLICATION.

2) It is recommended to enable DominantResourceCalculator so that no more than 
"cpu-vcores" number of concurrent containers running on a node manager:
{code:java}
yarn.scheduler.capacity.resource-calculator
= org.apache.hadoop.yarn.util.resource.DominantResourceCalculator {code}
3) Currently this feature does not work with Yarn Node Manager recovery. We may 
add recovery support in the future when we hook up with the right calls in the 
recovery flow.

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9011) Race condition during decommissioning

2019-09-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929397#comment-16929397
 ] 

Hadoop QA commented on YARN-9011:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 12s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 10 new + 343 unchanged - 0 fixed = 353 total (was 343) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
29s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 22s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 
42s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
40s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}207m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Exceptional return value of 
java.util.concurrent.locks.Condition.await(long, TimeUnit) ignored in 
org.apache.hadoop.yarn.server.resourcemanager.DecommissioningNodesSyncer.awaitDecommissiningStatus(RMNode,
 int)  At DecommissioningNodesSyncer.java:TimeUnit) ignored in 
org.apache.hadoop.yarn.server.resourcemanager.DecommissioningNodesSyncer.awaitDecommissiningStatus(RMNode,
 int)  At DecommissioningNodesSyncer.java:[line 35] |
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations |
|   | 

[jira] [Commented] (YARN-9787) Typo in analysesErrorMsg

2019-09-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929387#comment-16929387
 ] 

Hudson commented on YARN-9787:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17293 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17293/])
YARN-9787. Typo in analysesErrorMsg. Contributed by kevin su. (weichiu: rev 
42390073499deb6dc66e1e89b55888dba1214cf9)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java


> Typo in analysesErrorMsg
> 
>
> Key: YARN-9787
> URL: https://issues.apache.org/jira/browse/YARN-9787
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: kevin su
>Priority: Trivial
>  Labels: newbie, noob
> Fix For: 3.3.0
>
> Attachments: YARN-9787.001.patch
>
>
> {code:java}
>   analysis.append("Please check whether your etc/hadoop/mapred-site.xml "
>   + "contains the below configuration:\n");
> {code}
> I think it should be {{/etc/hadoop/mapred-site.xml}}
> https://github.com/apache/hadoop/blob/2064ca015d1584263aac0cc20c60b925a3aff612/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L788-L789



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9787) Typo in analysesErrorMsg

2019-09-13 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929390#comment-16929390
 ] 

Wei-Chiu Chuang commented on YARN-9787:
---

My bad. Looks like it was in my local tree and I didn't push it.
It's now up in trunk :) 

> Typo in analysesErrorMsg
> 
>
> Key: YARN-9787
> URL: https://issues.apache.org/jira/browse/YARN-9787
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: kevin su
>Priority: Trivial
>  Labels: newbie, noob
> Fix For: 3.3.0
>
> Attachments: YARN-9787.001.patch
>
>
> {code:java}
>   analysis.append("Please check whether your etc/hadoop/mapred-site.xml "
>   + "contains the below configuration:\n");
> {code}
> I think it should be {{/etc/hadoop/mapred-site.xml}}
> https://github.com/apache/hadoop/blob/2064ca015d1584263aac0cc20c60b925a3aff612/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L788-L789



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9808) Zero length files in container log output haven't got a header

2019-09-13 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929322#comment-16929322
 ] 

Adam Antal commented on YARN-9808:
--

Last checkstyle error is invalid.

> Zero length files in container log output haven't got a header
> --
>
> Key: YARN-9808
> URL: https://issues.apache.org/jira/browse/YARN-9808
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9808.001.patch, YARN-9808.002.patch, 
> YARN-9808.003.patch
>
>
> Using the Yarn logs CLI for containers that have zero length files produces 
> output similar to this:
> {noformat}
> End of LogType:stderr
> ***
> End of LogType:prelaunch.err
> **
> Container: container_e25_1567431105510_0001_01_02 on host-1
> LogAggregationType: AGGREGATED
> ===
> LogType:container.log
> LogLastModifiedTime:Mon Sep 02 06:34:48 -0700 2019
> LogLength:5442
> LogContents:
> ...
> ...
> {noformat}
> Note that stderr and prelaunch.err are both zero length files. Though the 
> output is not misleading, the header is missing.
> I suggest to add the header for zero length files as well, primarily for the 
> following reasons:
> - for applications having multiple files with the same name you may want to 
> distinguish them by host - if many of those are of zero length, you can not 
> extract this information from here. Note that this is a common case for 
> stderr and prelaunch.err.
> - you may want to see the modification time (which corresponds to the 
> creation time of the zero length file)
> - would explicitly display the "LogLength:0" line, which would avoid any 
> confusion from end user side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9814) JobHistoryServer can't delete aggregated files, if remote app root directory is created by NodeManager

2019-09-13 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929319#comment-16929319
 ] 

Adam Antal commented on YARN-9814:
--

[~Prabhu Joseph] could you review this if you have time?

> JobHistoryServer can't delete aggregated files, if remote app root directory 
> is created by NodeManager
> --
>
> Key: YARN-9814
> URL: https://issues.apache.org/jira/browse/YARN-9814
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, yarn
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9814.001.patch, YARN-9814.002.patch, 
> YARN-9814.003.patch
>
>
> If remote-app-log-dir is not created before starting Yarn processes, the 
> NodeManager creates it during the init of AppLogAggregator service. In a 
> custom system the primary group of the yarn user (which starts the NM/RM 
> daemons) is not hadoop, but set to a more restricted group (say yarn). If 
> NodeManager creates the folder it derives the group of the folder from the 
> primary group of the login user (which is yarn:yarn in this case), thus 
> setting the root log folder and all its subfolders to yarn group, ultimately 
> making it unaccessible to other processes - e.g. the JobHistoryServer's 
> AggregatedLogDeletionService.
> I suggest to make this group configurable. If this new configuration is not 
> set then we can still stick to the existing behaviour. 
> Creating the root app-log-dir each time during the setup of this system is a 
> bit error prone, and an end user can easily forget it. I think the best to 
> put this step is the LogAggregationService, which was responsible for 
> creating the folder already.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9011) Race condition during decommissioning

2019-09-13 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929234#comment-16929234
 ] 

Peter Bacsko commented on YARN-9011:


Uploaded v2, it's still considered to be a POC.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9011) Race condition during decommissioning

2019-09-13 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9011:
---
Attachment: YARN-9011-002.patch

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9825) Changes for initializing placement rules with ResourceScheduler in branch-2

2019-09-13 Thread Varun Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929134#comment-16929134
 ] 

Varun Saxena commented on YARN-9825:


Committed to branch-2. Thanks [~jhung] for your contribution

> Changes for initializing placement rules with ResourceScheduler in branch-2
> ---
>
> Key: YARN-9825
> URL: https://issues.apache.org/jira/browse/YARN-9825
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9825-branch-2.001.patch
>
>
> YARN-8016 and YARN-8948 add functionality to initialize placement rules with 
> ResourceScheduler. We need this in branch-2, but it doesn't apply cleanly. 
> Hence we just port the initialization logic.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9808) Zero length files in container log output haven't got a header

2019-09-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929062#comment-16929062
 ] 

Hadoop QA commented on YARN-9808:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
46s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
5s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 12s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 282 unchanged - 25 fixed = 283 total (was 307) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
41s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m  
2s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 
47s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:f4f9f0fe4f2 |
| JIRA Issue | YARN-9808 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980242/YARN-9808.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 997533e9a765 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4852a90 |
| 

[jira] [Updated] (YARN-9808) Zero length files in container log output haven't got a header

2019-09-13 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9808:
-
Attachment: YARN-9808.003.patch

> Zero length files in container log output haven't got a header
> --
>
> Key: YARN-9808
> URL: https://issues.apache.org/jira/browse/YARN-9808
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9808.001.patch, YARN-9808.002.patch, 
> YARN-9808.003.patch
>
>
> Using the Yarn logs CLI for containers that have zero length files produces 
> output similar to this:
> {noformat}
> End of LogType:stderr
> ***
> End of LogType:prelaunch.err
> **
> Container: container_e25_1567431105510_0001_01_02 on host-1
> LogAggregationType: AGGREGATED
> ===
> LogType:container.log
> LogLastModifiedTime:Mon Sep 02 06:34:48 -0700 2019
> LogLength:5442
> LogContents:
> ...
> ...
> {noformat}
> Note that stderr and prelaunch.err are both zero length files. Though the 
> output is not misleading, the header is missing.
> I suggest to add the header for zero length files as well, primarily for the 
> following reasons:
> - for applications having multiple files with the same name you may want to 
> distinguish them by host - if many of those are of zero length, you can not 
> extract this information from here. Note that this is a common case for 
> stderr and prelaunch.err.
> - you may want to see the modification time (which corresponds to the 
> creation time of the zero length file)
> - would explicitly display the "LogLength:0" line, which would avoid any 
> confusion from end user side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9808) Zero length files in container log output haven't got a header

2019-09-13 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929020#comment-16929020
 ] 

Adam Antal commented on YARN-9808:
--

Fixed checkstyle issues in  [^YARN-9808.003.patch].

> Zero length files in container log output haven't got a header
> --
>
> Key: YARN-9808
> URL: https://issues.apache.org/jira/browse/YARN-9808
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9808.001.patch, YARN-9808.002.patch, 
> YARN-9808.003.patch
>
>
> Using the Yarn logs CLI for containers that have zero length files produces 
> output similar to this:
> {noformat}
> End of LogType:stderr
> ***
> End of LogType:prelaunch.err
> **
> Container: container_e25_1567431105510_0001_01_02 on host-1
> LogAggregationType: AGGREGATED
> ===
> LogType:container.log
> LogLastModifiedTime:Mon Sep 02 06:34:48 -0700 2019
> LogLength:5442
> LogContents:
> ...
> ...
> {noformat}
> Note that stderr and prelaunch.err are both zero length files. Though the 
> output is not misleading, the header is missing.
> I suggest to add the header for zero length files as well, primarily for the 
> following reasons:
> - for applications having multiple files with the same name you may want to 
> distinguish them by host - if many of those are of zero length, you can not 
> extract this information from here. Note that this is a common case for 
> stderr and prelaunch.err.
> - you may want to see the modification time (which corresponds to the 
> creation time of the zero length file)
> - would explicitly display the "LogLength:0" line, which would avoid any 
> confusion from end user side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org