[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909
 ] 

lujie edited comment on YARN-7663 at 1/4/18 7:39 AM:
-

After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as [link 
YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]



was (Author: xiaoheipangzi):
After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. ishould  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as [link 
YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909
 ] 

lujie edited comment on YARN-7663 at 1/4/18 7:39 AM:
-

After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as 
[YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]



was (Author: xiaoheipangzi):
After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as [link 
YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663_3.patch

After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. ishould  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as [link 
YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release

2018-01-03 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310819#comment-16310819
 ] 

Vrushali C commented on YARN-7346:
--

Looks like HBase beta1 RC is out. We should try this with 2.0 beta1 and see how 
it goes. 

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
> Attachments: YARN-7346.00.patch, YARN-7346.prelim1.patch, 
> YARN-7346.prelim2.patch, YARN-7581.prelim.patch
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6894) RM Apps API returns only active apps when query parameter queue used

2018-01-03 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310763#comment-16310763
 ] 

Miklos Szegedi commented on YARN-6894:
--

Thank you, [~gsohn]!

> RM Apps API returns only active apps when query parameter queue used
> 
>
> Key: YARN-6894
> URL: https://issues.apache.org/jira/browse/YARN-6894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, restapi
>Reporter: Grant Sohn
>Assignee: Gergely Novák
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: YARN-6894.001.patch, YARN-6894.002.patch, 
> YARN-6894.003.patch
>
>
> If you run RM's Cluster Applications API with no query parameters, you get a 
> list of apps.
> If you run RM's Cluster Applications API with any query parameters other than 
> "queue" you get the list of apps with the parameter filters being applied.
> However, when you use the "queue" query parameter, you only see the 
> applications that are active in the cluster (NEW, NEW_SAVING, SUBMITTED, 
> ACCEPTED, RUNNING).  This behavior is inconsistent with the API.  If there is 
> a sound reason behind this, it should be documented and it seems like there 
> might be as the mapred queue CLI behaves similarly.
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7516) Security check for untrusted docker image

2018-01-03 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310726#comment-16310726
 ] 

Miklos Szegedi commented on YARN-7516:
--

[~eyang], [~ebadger], [~vinodkv], thank you for the patch and the reviews.
[~ebadger], to answer your question, I have the following opinion about this 
jira:
1. This jira is a simple measure to block untrusted registries, as [~vinodkv] 
suggested above.
2. If I wanted to secure my cluster and not just the registry but the 
individual images in Docker on Hadoop, I would probably rely on the content 
signing feature of Docker. Ref: 
https://docs.docker.com/engine/security/trust/content_trust/#content-trust-operations-and-keys
 It signs the actual content. Even if someone gets access to the registry or 
the channel but not the signing key, they cannot compromise the system. Its 
advantage is that it does not require any change in Hadoop, AFAIK. Hadoop 
remains simple. Setting that up is non-trivial though.
3. I would never allow privileged containers in my own cluster. Native 
(non-Docker) Hadoop did not allow that, it was not part of the design. If 
something really needs root privileges, I would probably create a SUID script 
or executable as simple as possible, just like container executor. I would make 
sure it does only what it needs to do and run it from a YARN container without 
a Docker container. This reduces the attack surface.
4. If the question is how to allow images that are able to run as privileged 
and provide a simple and secure interface, I would probably list the allowed 
Docker images with digest SHA256 hash values (image@6bff...) in 
container-executor.cfg. Hadoop remains simple. It protects the node local HDFS 
data by disallowing root escalation, even if the yarn user is compromised.
5. If we were at the initial design step and we needed a little bit more 
complex but secure and scalable solution, I would just stick a single public 
key into container-executor.cfg maybe together with a flag whether privileged 
containers are allowed at all (ref. 3. above). The client would send the whole 
Docker JSON including the image, volumes, privileged flag etc. in the launch 
context. The RM would verify, if the user has permission to do the Docker 
command, so not all users would be allowed to run privileged for example. It 
would then sign the JSON with the corresponding private key into a delegation 
token or JWT and pass the signature with the JSON to the node in the launch 
context. The node manager would then forward the JSON and the signature to the 
container executor or another executable as suggested in YARN-7506. The 
container executor would verify the JWT or delegation token signature with the 
public key in container-executor.cfg and forward the JSON to the Docker 
command, if the verification is successful. The signature means that the RM 
allowed the request. Hadoop remains simple, security is centralized in the RM. 
RM can even have a REST API to dynamically adjust privileges. The privilege 
check might even be programmed as patterns in the JSON, minimizing the changes 
to Hadoop. I admit, this is probably a suggestion too late.
6. Just a side note how much we need to worry about buffer overflows, I am more 
concerned about actual security design problems affecting both C and Java. Most 
of the 2.5 million lines of Hadoop is Java. Ref: 
https://www.openhub.net/p/Hadoop.




> Security check for untrusted docker image
> -
>
> Key: YARN-7516
> URL: https://issues.apache.org/jira/browse/YARN-7516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7516.001.patch, YARN-7516.002.patch, 
> YARN-7516.003.patch, YARN-7516.004.patch
>
>
> Hadoop YARN Services can support using private docker registry image or 
> docker image from docker hub.  In current implementation, Hadoop security is 
> enforced through username and group membership, and enforce uid:gid 
> consistency in docker container and distributed file system.  There is cloud 
> use case for having ability to run untrusted docker image on the same cluster 
> for testing.  
> The basic requirement for untrusted container is to ensure all kernel and 
> root privileges are dropped, and there is no interaction with distributed 
> file system to avoid contamination.  We can probably enforce detection of 
> untrusted docker image by checking the following:
> # If docker image is from public docker hub repository, the container is 
> automatically flagged as insecure, and disk volume mount are disabled 
> automatically, and drop all kernel capabilities.
> # If docker image is from private repository in docker hub, and there is a 
> white list to allow the private repository, disk volume mount is allowed, 
> kernel 

[jira] [Updated] (YARN-7697) NM goes down with OOM due to leak in log-aggregation

2018-01-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-7697:
--
Priority: Critical  (was: Major)
Target Version/s: 3.1.0

This will only happen with continuous log-aggregation with long running 
applications, but is nevertheless critical. Marking it so.

> NM goes down with OOM due to leak in log-aggregation
> 
>
> Key: YARN-7697
> URL: https://issues.apache.org/jira/browse/YARN-7697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Santhosh B Gowda
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-7697.1.patch
>
>
> 2017-12-29 01:43:50,601 FATAL yarn.YarnUncaughtExceptionHandler 
> (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread 
> Thread[LogAggregationService #0,5,main] threw an Error.  Shutting down now...
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:823)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:840)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriterInRolling(LogAggregationIndexedFileController.java:293)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.access$600(LogAggregationIndexedFileController.java:98)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:216)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:197)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:205)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:284)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-29 01:43:50,601 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application ap



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-01-03 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310553#comment-16310553
 ] 

Steven Rand commented on YARN-7655:
---

Tagging [~yufeigu] and [~templedf] for thoughts. I can work through the above 
weirdness with the test case, but interested to hear what people think of the 
proposed change.

> avoid AM preemption caused by RRs for specific nodes or racks
> -
>
> Key: YARN-7655
> URL: https://issues.apache.org/jira/browse/YARN-7655
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0
>Reporter: Steven Rand
>Assignee: Steven Rand
> Attachments: YARN-7655-001.patch
>
>
> We frequently see AM preemptions when 
> {{starvedApp.getStarvedResourceRequests()}} in 
> {{FSPreemptionThread#identifyContainersToPreempt}} includes one or more RRs 
> that request containers on a specific node. Since this causes us to only 
> consider one node to preempt containers on, the really good work that was 
> done in YARN-5830 doesn't save us from AM preemption. Even though there might 
> be multiple nodes on which we could preempt enough non-AM containers to 
> satisfy the app's starvation, we often wind up preempting one or more AM 
> containers on the single node that we're considering.
> A proposed solution is that if we're going to preempt one or more AM 
> containers for an RR that specifies a node or rack, then we should instead 
> expand the search space to consider all nodes. That way we take advantage of 
> YARN-5830, and only preempt AMs if there's no alternative. I've attached a 
> patch with an initial implementation of this. We've been running it on a few 
> clusters, and have seen AM preemptions drop from double-digit occurrences on 
> many days to zero.
> Of course, the tradeoff is some loss of locality, since the starved app is 
> less likely to be allocated resources at the most specific locality level 
> that it asked for. My opinion is that this tradeoff is worth it, but 
> interested to hear what others think as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7590) Improve container-executor validation check

2018-01-03 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310534#comment-16310534
 ] 

Miklos Szegedi commented on YARN-7590:
--

Thank you for the patch [~eyang]. I have two more style issues. I also verified 
the patch and it runs a basic mapreduce job and does not allow the scenario in 
the description as expected.
{code}
fprintf(LOGFILE, "Error checking file stats for %s.\n", nm_root);
{code}
It would be very useful to have a meaningful error message like 
{{fprintf(LOGFILE, "Error checking file stats for %s %d %s.\n", nm_root, err, 
strerror(err));}}. It helps a lot to support the feature.
{code}
  if (check != 0 || strstr(container_log_dir, "..") != 0) {
{code}
Like I mentioned before, I would separate the two checks with a meaningful 
error message in the second case. The first one already prints inside the call. 
This one also helps to support the feature.


> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7590.001.patch, YARN-7590.002.patch, 
> YARN-7590.003.patch, YARN-7590.004.patch, YARN-7590.005.patch, 
> YARN-7590.006.patch
>
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is owned by the same user as the caller to 
> container-executor.
> # Make sure the log directory prefix is owned by the same user as the caller.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7605) Implement doAs for Api Service REST API

2018-01-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310526#comment-16310526
 ] 

Jian He commented on YARN-7605:
---

- the addInternalServlet has an unused parameter clazz

> Implement doAs for Api Service REST API
> ---
>
> Key: YARN-7605
> URL: https://issues.apache.org/jira/browse/YARN-7605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Fix For: yarn-native-services
>
> Attachments: YARN-7605.001.patch, YARN-7605.004.patch, 
> YARN-7605.005.patch, YARN-7605.006.patch, YARN-7605.007.patch, 
> YARN-7605.008.patch, YARN-7605.009.patch
>
>
> In YARN-7540, all client entry points for API service is centralized to use 
> REST API instead of having direct file system and resource manager rpc calls. 
>  This change helped to centralize yarn metadata to be owned by yarn user 
> instead of crawling through every user's home directory to find metadata.  
> The next step is to make sure "doAs" calls work properly for API Service.  
> The metadata is stored by YARN user, but the actual workload still need to be 
> performed as end users, hence API service must authenticate end user kerberos 
> credential, and perform doAs call when requesting containers via 
> ServiceClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7605) Implement doAs for Api Service REST API

2018-01-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310519#comment-16310519
 ] 

Jian He commented on YARN-7605:
---

- we can't assume if result != 0, it is ApplicationNotFoundException ? and 
instead of "result.intValue() == 0", it can be "result == 0"
{code}
  if (result.intValue() == 0) {
ServiceStatus serviceStatus = new ServiceStatus();
serviceStatus.setDiagnostics("Successfully stopped service " +
appName);
return Response.status(Status.OK).entity(serviceStatus).build();
  } else {
throw new ApplicationNotFoundException("Service " + appName +
" is not found in YARN.");
  }
{code}
- what is this NullPointerException for ? we should fix the NPE instead ?
{code}
} catch (NullPointerException | IOException e) {
  LOG.error("Fail to stop service:", e);
{code}
- similarly, I think it is inappropriate to assume the cause of all these 
exceptions is service not found
{code}
} catch (InterruptedException | ApplicationNotFoundException |
UndeclaredThrowableException e) {
  ServiceStatus serviceStatus = new ServiceStatus();
  serviceStatus.setDiagnostics("Service " + appName +
  " is not found in YARN.");
{code}
- it is fine to just return the result 
{code}
return Integer.valueOf(result);
{code}
- create a common method for these ?
{code}
  UserGroupInformation proxyUser = UserGroupInformation.getLoginUser();
  UserGroupInformation ugi = UserGroupInformation
  .createProxyUser(request.getRemoteUser(), proxyUser);
{code}
- Again, it is inappropriate to assume all exceptions are caused by "Permission 
denied" ? this will confuse the users. And why not just user LOG.warn("...", 
e), but use a seaprate log statement for the exception ?
{code}
try {
  proxy.flexComponents(requestBuilder.build());
} catch (YarnException | IOException e) {
  LOG.warn("Exception caught during flex operation.");
  LOG.warn(ExceptionUtils.getFullStackTrace(e));
  throw new YarnException("Permission denied to perform flex operation.");
}
{code}
- why is it needed to add the InterruptedException to the method signature, and 
make callers catch this exception and then convert to YarnException ?
{code}
  private ApplicationId submitApp(Service app)
  throws IOException, YarnException, InterruptedException {
{code}
- unnecessary formatting change in createAMProxy, actionSave

> Implement doAs for Api Service REST API
> ---
>
> Key: YARN-7605
> URL: https://issues.apache.org/jira/browse/YARN-7605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Fix For: yarn-native-services
>
> Attachments: YARN-7605.001.patch, YARN-7605.004.patch, 
> YARN-7605.005.patch, YARN-7605.006.patch, YARN-7605.007.patch, 
> YARN-7605.008.patch, YARN-7605.009.patch
>
>
> In YARN-7540, all client entry points for API service is centralized to use 
> REST API instead of having direct file system and resource manager rpc calls. 
>  This change helped to centralize yarn metadata to be owned by yarn user 
> instead of crawling through every user's home directory to find metadata.  
> The next step is to make sure "doAs" calls work properly for API Service.  
> The metadata is stored by YARN user, but the actual workload still need to be 
> performed as end users, hence API service must authenticate end user kerberos 
> credential, and perform doAs call when requesting containers via 
> ServiceClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7622) Allow fair-scheduler configuration on HDFS

2018-01-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310492#comment-16310492
 ] 

Hudson commented on YARN-7622:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13446 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13446/])
YARN-7622. Allow fair-scheduler configuration on HDFS (gphillips via (rkanter: 
rev 7a550448036c9d140d2c35c684cc8023ceb8880e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java


> Allow fair-scheduler configuration on HDFS
> --
>
> Key: YARN-7622
> URL: https://issues.apache.org/jira/browse/YARN-7622
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, resourcemanager
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Attachments: YARN-7622.001.patch, YARN-7622.002.patch, 
> YARN-7622.003.patch, YARN-7622.004.patch, YARN-7622.005.patch, 
> YARN-7622.006.patch
>
>
> The FairScheduler requires the allocation file to be hosted on the local 
> filesystem on the RM node(s). Allowing HDFS to store the allocation file will 
> provide improved redundancy, more options for scheduler updates, and RM 
> failover consistency in HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7622) Allow fair-scheduler configuration on HDFS

2018-01-03 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310482#comment-16310482
 ] 

Robert Kanter commented on YARN-7622:
-

Thanks [~gphillips] and [~wilfreds] for review.  Committed to trunk!

[~gphillips], were you also wanting this to go into branch-2?  If so, we'll 
need a new patch - it doesn't apply cleanly (imports) or compile (lambdas).

> Allow fair-scheduler configuration on HDFS
> --
>
> Key: YARN-7622
> URL: https://issues.apache.org/jira/browse/YARN-7622
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, resourcemanager
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Attachments: YARN-7622.001.patch, YARN-7622.002.patch, 
> YARN-7622.003.patch, YARN-7622.004.patch, YARN-7622.005.patch, 
> YARN-7622.006.patch
>
>
> The FairScheduler requires the allocation file to be hosted on the local 
> filesystem on the RM node(s). Allowing HDFS to store the allocation file will 
> provide improved redundancy, more options for scheduler updates, and RM 
> failover consistency in HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7678) Ability to enable logging of container memory stats

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310467#comment-16310467
 ] 

genericqa commented on YARN-7678:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
 1s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 11s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:17213a0 |
| JIRA Issue | YARN-7678 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904479/YARN-7678-branch-2.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c1643b999646 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / f2a1d5f |
| maven | version: Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) |
| Default Java | 1.7.0_151 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19099/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19099/testReport/ |
| Max. process+thread count | 137 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19099/console |
| Powered by | 

[jira] [Commented] (YARN-7622) Allow fair-scheduler configuration on HDFS

2018-01-03 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310459#comment-16310459
 ] 

Robert Kanter commented on YARN-7622:
-

+1 LGTM

> Allow fair-scheduler configuration on HDFS
> --
>
> Key: YARN-7622
> URL: https://issues.apache.org/jira/browse/YARN-7622
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, resourcemanager
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Attachments: YARN-7622.001.patch, YARN-7622.002.patch, 
> YARN-7622.003.patch, YARN-7622.004.patch, YARN-7622.005.patch, 
> YARN-7622.006.patch
>
>
> The FairScheduler requires the allocation file to be hosted on the local 
> filesystem on the RM node(s). Allowing HDFS to store the allocation file will 
> provide improved redundancy, more options for scheduler updates, and RM 
> failover consistency in HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7516) Security check for untrusted docker image

2018-01-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310455#comment-16310455
 ] 

Eric Yang commented on YARN-7516:
-

[~ebadger] The same question has been ask by Vinod on Dec 11.  It might be 
technically correct to have 
{{yarn.nodemanager.runtime.linux.docker.trusted-registry}}, but it is difficult 
to find.  My thinking was a bigger plan for {{yarn.docker.}} namespace for end 
user friendly configuration.  I am open to reuse 
{{DOCKER_CONTAINER_RUNTIME_PREFIX}}, if majority of people prefer this.

I am not in favor of shifting all validation logic towards 
container-executor.cfg.  Java is still better than c in prevention of buffer 
overflow.  There is no guarantee that we don't introduce buffer overflow bugs 
in root user validation.  They can lead to new potential for spear phishing 
attack.  As the history has shown (YARN-7590), container-executor is not as 
harden as we thought.  If YARN user is compromised, then it can rewrite any 
file as any other user on hdfs.  We will have more coverage to setup defensive 
mechanism in YARN code logic to extend our security perimeter rather than focus 
solely on root user defensive mechanism only.

> Security check for untrusted docker image
> -
>
> Key: YARN-7516
> URL: https://issues.apache.org/jira/browse/YARN-7516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7516.001.patch, YARN-7516.002.patch, 
> YARN-7516.003.patch, YARN-7516.004.patch
>
>
> Hadoop YARN Services can support using private docker registry image or 
> docker image from docker hub.  In current implementation, Hadoop security is 
> enforced through username and group membership, and enforce uid:gid 
> consistency in docker container and distributed file system.  There is cloud 
> use case for having ability to run untrusted docker image on the same cluster 
> for testing.  
> The basic requirement for untrusted container is to ensure all kernel and 
> root privileges are dropped, and there is no interaction with distributed 
> file system to avoid contamination.  We can probably enforce detection of 
> untrusted docker image by checking the following:
> # If docker image is from public docker hub repository, the container is 
> automatically flagged as insecure, and disk volume mount are disabled 
> automatically, and drop all kernel capabilities.
> # If docker image is from private repository in docker hub, and there is a 
> white list to allow the private repository, disk volume mount is allowed, 
> kernel capabilities follows the allowed list.
> # If docker image is from private trusted registry with image name like 
> "private.registry.local:5000/centos", and white list allows this private 
> trusted repository.  Disk volume mount is allowed, kernel capabilities 
> follows the allowed list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7516) Security check for untrusted docker image

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310440#comment-16310440
 ] 

genericqa commented on YARN-7516:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
9s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
42s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
14s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 
53s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | 

[jira] [Commented] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310426#comment-16310426
 ] 

genericqa commented on YARN-7102:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 13s{color} | {color:orange} root: The patch generated 9 new + 328 unchanged 
- 16 fixed = 337 total (was 344) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  3s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 62m 
57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
58s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}151m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7102 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904462/YARN-7102.v16.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 47176fcf9635 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2f6c038 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310417#comment-16310417
 ] 

Jason Lowe commented on YARN-7663:
--

Thanks for updating the patch!

The unit test passes even without the fix, so it does not work well as a 
regression test.  The problem is that the test needs some way to recognize an 
invalid transition occurred.  Currently it simply logs a message which is not 
detectable by a unit test.  One way to work around this is to have RMAppImpl 
call a new, empty protected method, e.g.: onInvalidStateTransition, that the 
unit test can override to detect when invalid transitions occur.

Speaking of invalid transitions, there's another one happening in the unit 
test.  In the test's output:
{noformat}
2018-01-03 15:41:13,044 ERROR [Thread-96] rmapp.RMAppImpl 
(RMAppImpl.java:handle(886)) - App: application_1515015672238_0022 can't handle 
this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
APP_UPDATE_SAVED at KILLED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:884)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.sendAppUpdateSavedEvent(TestRMAppTransitions.java:492)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppStartAfterKilled(TestRMAppTransitions.java:1168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}

When an app is killed in the NEW state it skips storing it to the state store 
and simply moves it to the killed state.  That's why we get an invalid state 
transition when we try to send a state store event -- the state store should 
not be involved anymore once we're in the KILLED state.  Actually that's 
probably a bug -- I suspect apps that are killed from the NEW state don't end 
up getting recovered on RM restart.  That could be fixed as part of this JIRA, 
but it may be better to defer that to another JIRA.

It would be good to cleanup the whitespace nits identified by checkstyle as 
well.

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> 

[jira] [Commented] (YARN-7693) ContainersMonitor support configurable

2018-01-03 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310402#comment-16310402
 ] 

Arun Suresh commented on YARN-7693:
---

I agree with [~miklos.szeg...@cloudera.com], there are better ways, as he has 
listed, to do this than plug in a new monitor. Also, do take a look at the 
{{ResourceUtilizationTracker}} interface, which was actually designed to be 
pluggable (although we havn't externalized any config. variable for it). It is 
used by the ContainerScheduler to decide if there are resources available to 
start a container.

> ContainersMonitor support configurable
> --
>
> Key: YARN-7693
> URL: https://issues.apache.org/jira/browse/YARN-7693
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Minor
> Attachments: YARN-7693.001.patch, YARN-7693.002.patch
>
>
> Currently ContainersMonitor has only one default implementation 
> ContainersMonitorImpl,
> After introducing Opportunistic Container, ContainersMonitor needs to monitor 
> system metrics and even dynamically adjust Opportunistic and Guaranteed 
> resources in the cgroup, so another ContainersMonitor may need to be 
> implemented. 
> The current ContainerManagerImpl ContainersMonitorImpl direct new 
> ContainerManagerImpl, so ContainersMonitor need to be configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7678) Ability to enable logging of container memory stats

2018-01-03 Thread Jim Brennan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-7678:
--
Attachment: YARN-7678-branch-2.001.patch

Providing patch for branch-2.
Repeated manual tests described above for this patch.


> Ability to enable logging of container memory stats
> ---
>
> Key: YARN-7678
> URL: https://issues.apache.org/jira/browse/YARN-7678
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
> Attachments: YARN-7678-branch-2.001.patch, YARN-7678.001.patch
>
>
> YARN-3424 changed logging of memory stats from ContainersMonitorImpl to INFO 
> to DEBUG.
> We have found these log messages to be useful information in Out-of-Memory 
> situations - they provide detail that helps show the memory profile of the 
> container over time, which can be helpful in determining root cause.
> Here's an example message from YARN-3424:
> {noformat}
> 2015-03-27 09:32:48,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 9215 for container-id 
> container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
> used; 2.6 GB of 2.1 GB virtual memory used
> {noformat}
> Propose to change this to use a separate logger for this message, so that we 
> can enable debug logging for this without enabling all of the other debug 
> logging for ContainersMonitorImpl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment

2018-01-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310385#comment-16310385
 ] 

Eric Yang commented on YARN-7677:
-

[~ebadger] I think I understand the goal now.  Thank you.  In current 
implementation, partial declaration of bind-mount may allow HADOOP_CONF_DIR to 
be sourced implicitly without proper user consent.  This change will make 
yarnfile content more consistent that environment variable and mounting 
directories both needs to present in yarnfile to show {{HADOOP_CONF_DIR}} is 
exposed to docker container.  

> HADOOP_CONF_DIR should not be automatically put in task environment
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5366) Improve handling of the Docker container life cycle

2018-01-03 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310369#comment-16310369
 ] 

Eric Badger commented on YARN-5366:
---

Hey [~shaneku...@gmail.com], sorry for the delay. I plan to start reviewing 
this soon

> Improve handling of the Docker container life cycle
> ---
>
> Key: YARN-5366
> URL: https://issues.apache.org/jira/browse/YARN-5366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>  Labels: oct16-medium
> Attachments: YARN-5366.001.patch, YARN-5366.002.patch, 
> YARN-5366.003.patch, YARN-5366.004.patch, YARN-5366.005.patch, 
> YARN-5366.006.patch, YARN-5366.007.patch, YARN-5366.008.patch, 
> YARN-5366.009.patch, YARN-5366.010.patch
>
>
> There are several paths that need to be improved with regard to the Docker 
> container lifecycle when running Docker containers on YARN.
> 1) Provide the ability to keep a container on the NodeManager for a set 
> period of time for debugging purposes.
> 2) Support sending signals to the process in the container to allow for 
> triggering stack traces, heap dumps, etc.
> 3) Support for Docker's live restore, which means moving away from the use of 
> {{docker wait}}. (YARN-5818)
> 4) Improve the resiliency of liveliness checks (kill -0) by adding retries.
> 5) Improve the resiliency of container removal by adding retries.
> 6) Only attempt to stop, kill, and remove containers if the current container 
> state allows for it.
> 7) Better handling of short lived containers when the container is stopped 
> before the PID can be retrieved. (YARN-6305)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7622) Allow fair-scheduler configuration on HDFS

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310354#comment-16310354
 ] 

genericqa commented on YARN-7622:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 39 unchanged - 2 fixed = 39 total (was 41) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 63m 
45s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}107m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7622 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904463/YARN-7622.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1d5e6f89de62 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2f6c038 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19097/testReport/ |
| Max. process+thread count | 809 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19097/console |
| Powered by | Apache Yetus 

[jira] [Commented] (YARN-7516) Security check for untrusted docker image

2018-01-03 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310344#comment-16310344
 ] 

Eric Badger commented on YARN-7516:
---

{quote}
+  public static final String TRUSTED_DOCKER_REGISTRY = YARN_PREFIX +
+  "docker.trusted.registry";
{quote}
Is there a reason this doesn't use {{DOCKER_CONTAINER_RUNTIME_PREFIX}}?

I'm also wondering if this approach is consistent with what we went through 
with YARN-6623. In that JIRA we moved all sensitive configs into 
container-executor.cfg so that they could only be modified by root. By putting 
this property in yarn-site.xml, it is modifiable if the yarn user is 
compromised. So, if we assume a compromised yarn user, then this property won't 
stop an attacker from running arbitrary untrusted images. I think we should add 
this config to container-executor.cfg so that the property is guaranteed to be 
safe unless the attacker has gained root access. 

[~miklos.szeg...@cloudera.com], [~vvasudev], [~shaneku...@gmail.com], thoughts 
on this?

> Security check for untrusted docker image
> -
>
> Key: YARN-7516
> URL: https://issues.apache.org/jira/browse/YARN-7516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7516.001.patch, YARN-7516.002.patch, 
> YARN-7516.003.patch, YARN-7516.004.patch
>
>
> Hadoop YARN Services can support using private docker registry image or 
> docker image from docker hub.  In current implementation, Hadoop security is 
> enforced through username and group membership, and enforce uid:gid 
> consistency in docker container and distributed file system.  There is cloud 
> use case for having ability to run untrusted docker image on the same cluster 
> for testing.  
> The basic requirement for untrusted container is to ensure all kernel and 
> root privileges are dropped, and there is no interaction with distributed 
> file system to avoid contamination.  We can probably enforce detection of 
> untrusted docker image by checking the following:
> # If docker image is from public docker hub repository, the container is 
> automatically flagged as insecure, and disk volume mount are disabled 
> automatically, and drop all kernel capabilities.
> # If docker image is from private repository in docker hub, and there is a 
> white list to allow the private repository, disk volume mount is allowed, 
> kernel capabilities follows the allowed list.
> # If docker image is from private trusted registry with image name like 
> "private.registry.local:5000/centos", and white list allows this private 
> trusted repository.  Disk volume mount is allowed, kernel capabilities 
> follows the allowed list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment

2018-01-03 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310327#comment-16310327
 ] 

Eric Badger commented on YARN-7677:
---

bq. Some Hadoop features will not work, i.e. short circuits read, if host and 
docker containers are not matching.
This is true. I would like to work towards a solution where we use something 
similar {{dfs.domain.socket.path}}, since it already defines the short-circuit 
socket. However, I'm not sure how to do that without copying the config, since 
this is a dfs property that will be used by the datanode (i.e. not the 
container-executor).

bq. If we handle security properly with white list mount (YARN-5534), 
container-executor validation (YARN-7590), and check sudo privileges before 
launching privileged container (YARN-7221). Any particular reason that we 
shouldn't allow read-only bind-mount HADOOP_CONF_DIR?
Nope, I don't think there is any problem with bind-mounting 
{{HADOOP_CONF_DIR}}. However, I don't think it should be a requirement. For 
example, you should be able to use an older version of hadoop as the client 
(task), while the server (NM) uses a newer version. If we pass in 
{{HADOOP_CONF_DIR}} then this is not possible. If we are constantly 
bind-mounting in hadoop to all of the containers, then we lose some of the 
wonder of docker, which is that the container stays constant and consistent 
over time. Some may choose to bind-mount hadoop, but it should be a choice, not 
a requirement

bq. White list is used by container-executor, which resides in host, and not 
docker container. How is the by pass happens?
This happens because of a call in {{ContainerLaunch.java}} that automatically 
adds {{HADOOP_CONF_DIR}} to the environment. This environment is parsed in 
{{launch_container.sh}}, which is the script that the docker container is 
started with.

{noformat:title=ContainerLaunch.sh}
1388putEnvIfAbsent(environment, Environment.HADOOP_CONF_DIR.name());
{noformat}

> HADOOP_CONF_DIR should not be automatically put in task environment
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment

2018-01-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310288#comment-16310288
 ] 

Eric Yang edited comment on YARN-7677 at 1/3/18 9:33 PM:
-

I think I am stuck on understanding:

{quote}
It completely bypasses the whitelist and so there is no way for a task to not 
have HADOOP_CONF_DIR set. 
{quote}

White list is used by container-executor, which resides in host, and not docker 
container.  How is the by pass happened?


was (Author: eyang):
I think I am stuck on understanding:

{quote}
It completely bypasses the whitelist and so there is no way for a task to not 
have HADOOP_CONF_DIR set. 
{quote}

White list is used by container-executor, which resides in host, and not docker 
container.  How is the by pass happens?

> HADOOP_CONF_DIR should not be automatically put in task environment
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7666) Introduce scheduler specific environment variable support in ASC for better scheduling placement configurations

2018-01-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310312#comment-16310312
 ] 

Wangda Tan commented on YARN-7666:
--

Failed tests look like related, beyond that, patch LGTM.  Thanks [~sunilg]

> Introduce scheduler specific environment variable support in ASC for better 
> scheduling placement configurations
> ---
>
> Key: YARN-7666
> URL: https://issues.apache.org/jira/browse/YARN-7666
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-7666.001.patch, YARN-7666.002.patch, 
> YARN-7666.003.patch, YARN-7666.004.patch
>
>
> Introduce a scheduler specific key-value map to hold env variables in ASC.
> And also convert AppPlacementAllocator initialization to each app based on 
> policy configured at each app.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7516) Security check for untrusted docker image

2018-01-03 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7516:

Attachment: YARN-7516.004.patch

[~ebadger] Patch applies.  The rebased patch is attached in 004.


> Security check for untrusted docker image
> -
>
> Key: YARN-7516
> URL: https://issues.apache.org/jira/browse/YARN-7516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7516.001.patch, YARN-7516.002.patch, 
> YARN-7516.003.patch, YARN-7516.004.patch
>
>
> Hadoop YARN Services can support using private docker registry image or 
> docker image from docker hub.  In current implementation, Hadoop security is 
> enforced through username and group membership, and enforce uid:gid 
> consistency in docker container and distributed file system.  There is cloud 
> use case for having ability to run untrusted docker image on the same cluster 
> for testing.  
> The basic requirement for untrusted container is to ensure all kernel and 
> root privileges are dropped, and there is no interaction with distributed 
> file system to avoid contamination.  We can probably enforce detection of 
> untrusted docker image by checking the following:
> # If docker image is from public docker hub repository, the container is 
> automatically flagged as insecure, and disk volume mount are disabled 
> automatically, and drop all kernel capabilities.
> # If docker image is from private repository in docker hub, and there is a 
> white list to allow the private repository, disk volume mount is allowed, 
> kernel capabilities follows the allowed list.
> # If docker image is from private trusted registry with image name like 
> "private.registry.local:5000/centos", and white list allows this private 
> trusted repository.  Disk volume mount is allowed, kernel capabilities 
> follows the allowed list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7678) Ability to enable logging of container memory stats

2018-01-03 Thread Jim Brennan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310302#comment-16310302
 ] 

Jim Brennan commented on YARN-7678:
---

Will do.  Thanks!

> Ability to enable logging of container memory stats
> ---
>
> Key: YARN-7678
> URL: https://issues.apache.org/jira/browse/YARN-7678
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
> Attachments: YARN-7678.001.patch
>
>
> YARN-3424 changed logging of memory stats from ContainersMonitorImpl to INFO 
> to DEBUG.
> We have found these log messages to be useful information in Out-of-Memory 
> situations - they provide detail that helps show the memory profile of the 
> container over time, which can be helpful in determining root cause.
> Here's an example message from YARN-3424:
> {noformat}
> 2015-03-27 09:32:48,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 9215 for container-id 
> container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
> used; 2.6 GB of 2.1 GB virtual memory used
> {noformat}
> Propose to change this to use a separate logger for this message, so that we 
> can enable debug logging for this without enabling all of the other debug 
> logging for ContainersMonitorImpl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7678) Ability to enable logging of container memory stats

2018-01-03 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-7678:
-
Summary: Ability to enable logging of container memory stats  (was: Logging 
of container memory stats is missing in 2.8)

Thanks for the patch!  +1 lgtm.

The patch does not apply to branch-2.  Would you mind providing a patch for 2.x?

> Ability to enable logging of container memory stats
> ---
>
> Key: YARN-7678
> URL: https://issues.apache.org/jira/browse/YARN-7678
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0, 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
> Attachments: YARN-7678.001.patch
>
>
> YARN-3424 changed logging of memory stats from ContainersMonitorImpl to INFO 
> to DEBUG.
> We have found these log messages to be useful information in Out-of-Memory 
> situations - they provide detail that helps show the memory profile of the 
> container over time, which can be helpful in determining root cause.
> Here's an example message from YARN-3424:
> {noformat}
> 2015-03-27 09:32:48,905 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 9215 for container-id 
> container_1427462602546_0002_01_08: 189.8 MB of 1 GB physical memory 
> used; 2.6 GB of 2.1 GB virtual memory used
> {noformat}
> Propose to change this to use a separate logger for this message, so that we 
> can enable debug logging for this without enabling all of the other debug 
> logging for ContainersMonitorImpl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment

2018-01-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310288#comment-16310288
 ] 

Eric Yang commented on YARN-7677:
-

I think I am stuck on understanding:

{quote}
It completely bypasses the whitelist and so there is no way for a task to not 
have HADOOP_CONF_DIR set. 
{quote}

White list is used by container-executor, which resides in host, and not docker 
container.  How is the by pass happens?

> HADOOP_CONF_DIR should not be automatically put in task environment
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7693) ContainersMonitor support configurable

2018-01-03 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310276#comment-16310276
 ] 

Miklos Szegedi commented on YARN-7693:
--

Thank you for the reply [~yangjiandan].
+0 on the approach adding a separarate monitor class for this. I think it is 
useful to be able to change the monitor.
In terms of the feature you described I have some suggestions, you may want to 
consider.
First of all please consider using a JIRA feature for your project making this 
as a sub-task. How about doing this as part of YARN-1747 or even better 
YARN-1011?
You may want to leverage the option to simply turn off the current cgroups 
memory enforcement using the configuration added in YARN-7064. It also handles 
monitoring resource utilization using cgroups.
bq. 1) Separate containers into two different group Opportunistic_Group and 
Guaranteed_Group under hadoop-yarn
The reason why it is useful to have a single cgroup hadoop-yarn for all 
containers that you can set a single logic and control the OOM killer for all. 
I would be happy to look at the actual code, but adjusting two different 
cgroups may add too much complexity. It is especially problematic in case of 
promotion. When an opportunistic container is promoted to guaranted, you need 
to move to the other cgroup but this requires heavy lifting from the kernel 
that takes significant time. See 
https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt for details.
bq. 2) Monitor system resource utilization and dynamically adjust resource of 
Opportunistic_Group
The concern here is that dynamically adjusting does not work in the current 
implementation either. This is because it is too slow to respond in extreme 
cases. Please check out YARN-6677, YARN-4599 and YARN-1014. The idea there is 
to disable the OOM killer on hadoop-yarn as you also suggested, so that we get 
notified by the kernel when the system resource utilization is low. YARN can 
then decide which container to preempt or adjust the soft limit, while the 
containers are paused. The preemption unblocks the containers. Please let us 
know, if you have time and you would like to contribute.
bq. 3) Kill container only when adjust resource fail for given times
I absolutely agree with this. A sudden spike in cpu usage should not trigger 
immediate preemption. In case of memory I am not sure how much you can adjust 
though. My understanding is that the basic design of opportunistic containers 
is that they never affect the performance of guaranteed ones but using IO for 
swapping would exactly do that. How would you reduce memory usage if not 
preempting?

> ContainersMonitor support configurable
> --
>
> Key: YARN-7693
> URL: https://issues.apache.org/jira/browse/YARN-7693
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Minor
> Attachments: YARN-7693.001.patch, YARN-7693.002.patch
>
>
> Currently ContainersMonitor has only one default implementation 
> ContainersMonitorImpl,
> After introducing Opportunistic Container, ContainersMonitor needs to monitor 
> system metrics and even dynamically adjust Opportunistic and Guaranteed 
> resources in the cgroup, so another ContainersMonitor may need to be 
> implemented. 
> The current ContainerManagerImpl ContainersMonitorImpl direct new 
> ContainerManagerImpl, so ContainersMonitor need to be configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment

2018-01-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310273#comment-16310273
 ] 

Eric Yang commented on YARN-7677:
-

[~ebadger] My initial reaction would be to make docker container to follow the 
host layout for single cluster setup.  Some Hadoop features will not work, i.e. 
short circuits read, if host and docker containers are not matching.  For 
docker container to regenerate fine tuned Hadoop configuration to match host, 
it could take some effort from docker container developer.  There is a high 
probability that developers end up using bind-mount to transport Hadoop 
configuration.

If we handle security properly with white list mount (YARN-5534), 
container-executor validation (YARN-7590), and check sudo privileges before 
launching privileged container (YARN-7221).  Any particular reason that we 
shouldn't allow read-only bind-mount {{HADOOP_CONF_DIR}}?  

> HADOOP_CONF_DIR should not be automatically put in task environment
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7516) Security check for untrusted docker image

2018-01-03 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310268#comment-16310268
 ] 

Eric Badger commented on YARN-7516:
---

Hey [~eyang], I tried to apply the patch and it doesn't apply (at least via git 
apply, probably works via {{patch}}). Looks like YARN-7662 messed up the 
context. I'll review the patch, but in the meantime could you upload a patch 
with updated context? 

> Security check for untrusted docker image
> -
>
> Key: YARN-7516
> URL: https://issues.apache.org/jira/browse/YARN-7516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7516.001.patch, YARN-7516.002.patch, 
> YARN-7516.003.patch
>
>
> Hadoop YARN Services can support using private docker registry image or 
> docker image from docker hub.  In current implementation, Hadoop security is 
> enforced through username and group membership, and enforce uid:gid 
> consistency in docker container and distributed file system.  There is cloud 
> use case for having ability to run untrusted docker image on the same cluster 
> for testing.  
> The basic requirement for untrusted container is to ensure all kernel and 
> root privileges are dropped, and there is no interaction with distributed 
> file system to avoid contamination.  We can probably enforce detection of 
> untrusted docker image by checking the following:
> # If docker image is from public docker hub repository, the container is 
> automatically flagged as insecure, and disk volume mount are disabled 
> automatically, and drop all kernel capabilities.
> # If docker image is from private repository in docker hub, and there is a 
> white list to allow the private repository, disk volume mount is allowed, 
> kernel capabilities follows the allowed list.
> # If docker image is from private trusted registry with image name like 
> "private.registry.local:5000/centos", and white list allows this private 
> trusted repository.  Disk volume mount is allowed, kernel capabilities 
> follows the allowed list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6894) RM Apps API returns only active apps when query parameter queue used

2018-01-03 Thread Grant Sohn (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310250#comment-16310250
 ] 

Grant Sohn commented on YARN-6894:
--

Thanks [~miklos.szeg...@cloudera.com] for committing this on congratulations on 
your committership!

> RM Apps API returns only active apps when query parameter queue used
> 
>
> Key: YARN-6894
> URL: https://issues.apache.org/jira/browse/YARN-6894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, restapi
>Reporter: Grant Sohn
>Assignee: Gergely Novák
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: YARN-6894.001.patch, YARN-6894.002.patch, 
> YARN-6894.003.patch
>
>
> If you run RM's Cluster Applications API with no query parameters, you get a 
> list of apps.
> If you run RM's Cluster Applications API with any query parameters other than 
> "queue" you get the list of apps with the parameter filters being applied.
> However, when you use the "queue" query parameter, you only see the 
> applications that are active in the cluster (NEW, NEW_SAVING, SUBMITTED, 
> ACCEPTED, RUNNING).  This behavior is inconsistent with the API.  If there is 
> a sound reason behind this, it should be documented and it seems like there 
> might be as the mapred queue CLI behaves similarly.
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment

2018-01-03 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310247#comment-16310247
 ] 

Eric Badger commented on YARN-7677:
---

[~eyang], it doesn't necessarily need to be separate hadoop clusters. It could 
just be a node where the NM runs on the bare metal host and the tasks run in 
docker containers. In that case, they would need to know where 
{{HADOOP_CONF_DIR}} is. Since the docker image is completely separate from the 
host layout, we can't assume that hadoop is going to be put in the same place. 
{{HADOOP_CONF_DIR}} isn't getting bind-mounted into the container, so the only 
way this would even work is by a happy coincidence and/or planning the layout 
of the image to match that of the host. But that coupling is certainly not 
necessary and the docker image is the one that actually knows where 
{{HADOOP_CONF_DIR}} is located. The nodemanager knows where its 
{{HADOOP_CONF_DIR}} is located, but that is on the host, not in the docker 
container. 

And again, since {{HADOOP_CONF_DIR}} is in the default env whitelist, the 
behavior here will only change if you explicitly change the env whitelist and 
remove it. So I believe the impact here to be fairly low. Regardless, I don't 
think it's correct for the NM to be defining the layout of the docker image 
(i.e. where {{HADOOP_CONF_DIR}} has to be located). 

> HADOOP_CONF_DIR should not be automatically put in task environment
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7697) NM goes down with OOM due to leak in log-aggregation

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310246#comment-16310246
 ] 

genericqa commented on YARN-7697:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
57s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7697 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904450/YARN-7697.1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 06e9d5374757 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2f6c038 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19094/testReport/ |
| Max. process+thread count | 407 (vs. ulimit of 5000) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19094/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NM goes down with 

[jira] [Updated] (YARN-7622) Allow fair-scheduler configuration on HDFS

2018-01-03 Thread Greg Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Phillips updated YARN-7622:

Attachment: (was: YARN-7622.006.patch)

> Allow fair-scheduler configuration on HDFS
> --
>
> Key: YARN-7622
> URL: https://issues.apache.org/jira/browse/YARN-7622
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, resourcemanager
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Attachments: YARN-7622.001.patch, YARN-7622.002.patch, 
> YARN-7622.003.patch, YARN-7622.004.patch, YARN-7622.005.patch, 
> YARN-7622.006.patch
>
>
> The FairScheduler requires the allocation file to be hosted on the local 
> filesystem on the RM node(s). Allowing HDFS to store the allocation file will 
> provide improved redundancy, more options for scheduler updates, and RM 
> failover consistency in HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7622) Allow fair-scheduler configuration on HDFS

2018-01-03 Thread Greg Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Phillips updated YARN-7622:

Attachment: YARN-7622.006.patch

> Allow fair-scheduler configuration on HDFS
> --
>
> Key: YARN-7622
> URL: https://issues.apache.org/jira/browse/YARN-7622
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, resourcemanager
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Attachments: YARN-7622.001.patch, YARN-7622.002.patch, 
> YARN-7622.003.patch, YARN-7622.004.patch, YARN-7622.005.patch, 
> YARN-7622.006.patch
>
>
> The FairScheduler requires the allocation file to be hosted on the local 
> filesystem on the RM node(s). Allowing HDFS to store the allocation file will 
> provide improved redundancy, more options for scheduler updates, and RM 
> failover consistency in HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT

2018-01-03 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-7102:
---
Attachment: YARN-7102.v16.patch

> NM heartbeat stuck when responseId overflows MAX_INT
> 
>
> Key: YARN-7102
> URL: https://issues.apache.org/jira/browse/YARN-7102
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Critical
> Attachments: YARN-7102-branch-2.8.v10.patch, 
> YARN-7102-branch-2.8.v11.patch, YARN-7102-branch-2.8.v14.patch, 
> YARN-7102-branch-2.8.v14.patch, YARN-7102-branch-2.8.v9.patch, 
> YARN-7102-branch-2.v14.patch, YARN-7102-branch-2.v14.patch, 
> YARN-7102-branch-2.v9.patch, YARN-7102-branch-2.v9.patch, 
> YARN-7102-branch-2.v9.patch, YARN-7102.v1.patch, YARN-7102.v12.patch, 
> YARN-7102.v13.patch, YARN-7102.v14.patch, YARN-7102.v15.patch, 
> YARN-7102.v16.patch, YARN-7102.v2.patch, YARN-7102.v3.patch, 
> YARN-7102.v4.patch, YARN-7102.v5.patch, YARN-7102.v6.patch, 
> YARN-7102.v7.patch, YARN-7102.v8.patch, YARN-7102.v9.patch
>
>
> ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM 
> heartbeat in YARN-6640, please refer to YARN-6640 for details. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7622) Allow fair-scheduler configuration on HDFS

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310224#comment-16310224
 ] 

genericqa commented on YARN-7622:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} YARN-7622 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7622 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904458/YARN-7622.006.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19095/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Allow fair-scheduler configuration on HDFS
> --
>
> Key: YARN-7622
> URL: https://issues.apache.org/jira/browse/YARN-7622
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, resourcemanager
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Attachments: YARN-7622.001.patch, YARN-7622.002.patch, 
> YARN-7622.003.patch, YARN-7622.004.patch, YARN-7622.005.patch, 
> YARN-7622.006.patch
>
>
> The FairScheduler requires the allocation file to be hosted on the local 
> filesystem on the RM node(s). Allowing HDFS to store the allocation file will 
> provide improved redundancy, more options for scheduler updates, and RM 
> failover consistency in HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7622) Allow fair-scheduler configuration on HDFS

2018-01-03 Thread Greg Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310216#comment-16310216
 ] 

Greg Phillips edited comment on YARN-7622 at 1/3/18 8:09 PM:
-

[~rkanter] - s3 & viewfs have been added to allowed filesystems in 006, and I 
did not encounter any issues when testing against those filesystems.


was (Author: gphillips):
[~rkanter] - s3 & viewfs have been added to allowed filesystems in 008, and I 
did not encounter any issues when testing against those filesystems.

> Allow fair-scheduler configuration on HDFS
> --
>
> Key: YARN-7622
> URL: https://issues.apache.org/jira/browse/YARN-7622
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, resourcemanager
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Attachments: YARN-7622.001.patch, YARN-7622.002.patch, 
> YARN-7622.003.patch, YARN-7622.004.patch, YARN-7622.005.patch, 
> YARN-7622.006.patch
>
>
> The FairScheduler requires the allocation file to be hosted on the local 
> filesystem on the RM node(s). Allowing HDFS to store the allocation file will 
> provide improved redundancy, more options for scheduler updates, and RM 
> failover consistency in HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7622) Allow fair-scheduler configuration on HDFS

2018-01-03 Thread Greg Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310216#comment-16310216
 ] 

Greg Phillips commented on YARN-7622:
-

[~rkanter] - s3 & viewfs have been added to allowed filesystems in 008, and I 
did not encounter any issues when testing against those filesystems.

> Allow fair-scheduler configuration on HDFS
> --
>
> Key: YARN-7622
> URL: https://issues.apache.org/jira/browse/YARN-7622
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, resourcemanager
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Attachments: YARN-7622.001.patch, YARN-7622.002.patch, 
> YARN-7622.003.patch, YARN-7622.004.patch, YARN-7622.005.patch, 
> YARN-7622.006.patch
>
>
> The FairScheduler requires the allocation file to be hosted on the local 
> filesystem on the RM node(s). Allowing HDFS to store the allocation file will 
> provide improved redundancy, more options for scheduler updates, and RM 
> failover consistency in HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7622) Allow fair-scheduler configuration on HDFS

2018-01-03 Thread Greg Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Phillips updated YARN-7622:

Attachment: YARN-7622.006.patch

> Allow fair-scheduler configuration on HDFS
> --
>
> Key: YARN-7622
> URL: https://issues.apache.org/jira/browse/YARN-7622
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, resourcemanager
>Reporter: Greg Phillips
>Assignee: Greg Phillips
>Priority: Minor
> Attachments: YARN-7622.001.patch, YARN-7622.002.patch, 
> YARN-7622.003.patch, YARN-7622.004.patch, YARN-7622.005.patch, 
> YARN-7622.006.patch
>
>
> The FairScheduler requires the allocation file to be hosted on the local 
> filesystem on the RM node(s). Allowing HDFS to store the allocation file will 
> provide improved redundancy, more options for scheduler updates, and RM 
> failover consistency in HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6971) Clean up different ways to create resources

2018-01-03 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310210#comment-16310210
 ] 

Yufei Gu commented on YARN-6971:


[~lovekesh.bansal], I am not working on this. Feel free to take it.

> Clean up different ways to create resources
> ---
>
> Key: YARN-6971
> URL: https://issues.apache.org/jira/browse/YARN-6971
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Minor
>  Labels: newbie
>
> There are several ways to create a {{resource}} object, e.g., 
> BuilderUtils.newResource() and Resources.createResource(). These methods not 
> only cause confusing but also performance issues, for example 
> BuilderUtils.newResource() is significant slow than 
> Resources.createResource(). 
> We could merge them some how, and replace most BuilderUtils.newResource() 
> with Resources.createResource().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310176#comment-16310176
 ] 

genericqa commented on YARN-7102:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 33m 
 8s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  5m 
23s{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 17m 
10s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
39s{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
36s{color} | {color:red} hadoop-sls in trunk failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
27s{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-sls in trunk failed. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
30s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
28s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 28s{color} 
| {color:red} root in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 18s{color} | {color:orange} root: The patch generated 9 new + 328 unchanged 
- 16 fixed = 337 total (was 344) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
33s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
34s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  1m  
4s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
30s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
32s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m  
9s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
10s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 12s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 10s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue}  0m 
10s{color} | {color:blue} 

[jira] [Updated] (YARN-7697) NM goes down with OOM due to leak in log-aggregation

2018-01-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-7697:

Attachment: YARN-7697.1.patch

> NM goes down with OOM due to leak in log-aggregation
> 
>
> Key: YARN-7697
> URL: https://issues.apache.org/jira/browse/YARN-7697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Santhosh B Gowda
>Assignee: Xuan Gong
> Attachments: YARN-7697.1.patch
>
>
> 2017-12-29 01:43:50,601 FATAL yarn.YarnUncaughtExceptionHandler 
> (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread 
> Thread[LogAggregationService #0,5,main] threw an Error.  Shutting down now...
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:823)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:840)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriterInRolling(LogAggregationIndexedFileController.java:293)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.access$600(LogAggregationIndexedFileController.java:98)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:216)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:197)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:205)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:284)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-29 01:43:50,601 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application ap



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7697) NM goes down with OOM due to leak in log-aggregation

2018-01-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310159#comment-16310159
 ] 

Xuan Gong commented on YARN-7697:
-

Upload a fix for this issue which includes two parts:
1) not use truncate API. Instead, we directly read the endIndex from checksum 
file
2) Check UUID before we allocate byte array. Because UUID is fixed (both length 
and value) and always appended at the end. If the loaded UUID is the same as 
the original UUID, we can guarantee everything before the UUID is correct.

> NM goes down with OOM due to leak in log-aggregation
> 
>
> Key: YARN-7697
> URL: https://issues.apache.org/jira/browse/YARN-7697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Santhosh B Gowda
>Assignee: Xuan Gong
>
> 2017-12-29 01:43:50,601 FATAL yarn.YarnUncaughtExceptionHandler 
> (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread 
> Thread[LogAggregationService #0,5,main] threw an Error.  Shutting down now...
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:823)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:840)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriterInRolling(LogAggregationIndexedFileController.java:293)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.access$600(LogAggregationIndexedFileController.java:98)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:216)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:197)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:205)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:284)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-29 01:43:50,601 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application ap



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7697) NM goes down with OOM due to leak in log-aggregation

2018-01-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310156#comment-16310156
 ] 

Xuan Gong commented on YARN-7697:
-

The issue happens after file truncate process. Looks like the truncate API 
return false instead of throw exception, so we still read the corrupted 
aggregated log. 

In the process of reading logs, we would allocate a byte array
{code}
byte[] array = new byte[offset]; // this line throws OOM
fsDataIStream.seek(
  fileLength - offset - Integer.SIZE/ Byte.SIZE - UUID_LENGTH);
{code}
So, the offset is in-correct, and probably a invalid big value, we could get 
OOM in NM.

> NM goes down with OOM due to leak in log-aggregation
> 
>
> Key: YARN-7697
> URL: https://issues.apache.org/jira/browse/YARN-7697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Santhosh B Gowda
>Assignee: Xuan Gong
>
> 2017-12-29 01:43:50,601 FATAL yarn.YarnUncaughtExceptionHandler 
> (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread 
> Thread[LogAggregationService #0,5,main] threw an Error.  Shutting down now...
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:823)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:840)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriterInRolling(LogAggregationIndexedFileController.java:293)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.access$600(LogAggregationIndexedFileController.java:98)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:216)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:197)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:205)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:284)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2017-12-29 01:43:50,601 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application ap



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7697) NM goes down with OOM due to leak in log-aggregation

2018-01-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-7697:

Description: 
2017-12-29 01:43:50,601 FATAL yarn.YarnUncaughtExceptionHandler 
(YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread 
Thread[LogAggregationService #0,5,main] threw an Error.  Shutting down now...
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:823)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:840)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriterInRolling(LogAggregationIndexedFileController.java:293)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.access$600(LogAggregationIndexedFileController.java:98)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:216)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:197)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:205)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:312)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:284)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2017-12-29 01:43:50,601 INFO  application.ApplicationImpl 
(ApplicationImpl.java:handle(464)) - Application ap

> NM goes down with OOM due to leak in log-aggregation
> 
>
> Key: YARN-7697
> URL: https://issues.apache.org/jira/browse/YARN-7697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Santhosh B Gowda
>Assignee: Xuan Gong
>
> 2017-12-29 01:43:50,601 FATAL yarn.YarnUncaughtExceptionHandler 
> (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread 
> Thread[LogAggregationService #0,5,main] threw an Error.  Shutting down now...
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:823)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.loadIndexedLogsMeta(LogAggregationIndexedFileController.java:840)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriterInRolling(LogAggregationIndexedFileController.java:293)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.access$600(LogAggregationIndexedFileController.java:98)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:216)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:197)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:205)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:312)
> at 
> 

[jira] [Created] (YARN-7697) NM goes down with OOM due to leak in log-aggregation

2018-01-03 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-7697:
---

 Summary: NM goes down with OOM due to leak in log-aggregation
 Key: YARN-7697
 URL: https://issues.apache.org/jira/browse/YARN-7697
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Santhosh B Gowda
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310135#comment-16310135
 ] 

genericqa commented on YARN-7663:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
55s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 58m 
30s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}119m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7663 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904435/YARN-7663_1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 26e8e382aed2 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / fe35103 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19092/testReport/ |
| Max. process+thread count | 819 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19092/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   

[jira] [Updated] (YARN-7561) Why hasContainerForNode() return false directly when there is no request of ANY locality without considering NODE_LOCAL and RACK_LOCAL?

2018-01-03 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-7561:
---
Affects Version/s: 3.1.0

> Why hasContainerForNode() return false directly when there is no request of 
> ANY locality without considering NODE_LOCAL and RACK_LOCAL?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3, 3.1.0
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
> (rackRequest != null && rackRequest.getNumContainers() > 0)) 
> &&
> // If locality relaxation is turned off at rack-level, there must 
> be a
> // non-zero request at the node:
> (rackRequest == null || rackRequest.getRelaxLocality() ||
> (nodeRequest != null && nodeRequest.getNumContainers() > 0)) 
> &&
> // The requested container must be able to fit on the node:
> Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
> anyRequest.getCapability(), 
> node.getRMNode().getTotalCapability());
> }
> {code}
> I really cannot understand why when there is no anyRequest , 
> *hasContainerForNode()* return false directly without considering whether 
> there is NODE_LOCAL  or  RACK_LOCAL requests.
> And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
> *AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
> containers for *ResourceRequest.ANY*, this is another place where I feel 
> confused.
> Really thanks for some prompt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310093#comment-16310093
 ] 

genericqa commented on YARN-7663:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 63m 
52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7663 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904433/YARN-7663_2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bd3f57cd36cb 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / fe35103 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/19091/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19091/testReport/ |
| Max. process+thread count | 888 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 

[jira] [Commented] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT

2018-01-03 Thread Botong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310045#comment-16310045
 ] 

Botong Huang commented on YARN-7102:


Hi [~jlowe], in v15 patch I moved part of the reconnect logic into 
ResourceTrackerService.registerNodeManager as you suggested. The specific 
condition where we replace with a new RMNode is a bit weird though. Reading the 
original code in ReconnectNodeTransition, various conditions about when to swap 
the node and when to call NodeResourceUpdateSchedulerEvent seems inconsistent. 
I am wondering if it is possible to always not swap and use the existing 
RMNode. Please let me know what you think, thanks! 

> NM heartbeat stuck when responseId overflows MAX_INT
> 
>
> Key: YARN-7102
> URL: https://issues.apache.org/jira/browse/YARN-7102
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Critical
> Attachments: YARN-7102-branch-2.8.v10.patch, 
> YARN-7102-branch-2.8.v11.patch, YARN-7102-branch-2.8.v14.patch, 
> YARN-7102-branch-2.8.v14.patch, YARN-7102-branch-2.8.v9.patch, 
> YARN-7102-branch-2.v14.patch, YARN-7102-branch-2.v14.patch, 
> YARN-7102-branch-2.v9.patch, YARN-7102-branch-2.v9.patch, 
> YARN-7102-branch-2.v9.patch, YARN-7102.v1.patch, YARN-7102.v12.patch, 
> YARN-7102.v13.patch, YARN-7102.v14.patch, YARN-7102.v15.patch, 
> YARN-7102.v2.patch, YARN-7102.v3.patch, YARN-7102.v4.patch, 
> YARN-7102.v5.patch, YARN-7102.v6.patch, YARN-7102.v7.patch, 
> YARN-7102.v8.patch, YARN-7102.v9.patch
>
>
> ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM 
> heartbeat in YARN-6640, please refer to YARN-6640 for details. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT

2018-01-03 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-7102:
---
Attachment: YARN-7102.v15.patch

> NM heartbeat stuck when responseId overflows MAX_INT
> 
>
> Key: YARN-7102
> URL: https://issues.apache.org/jira/browse/YARN-7102
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Critical
> Attachments: YARN-7102-branch-2.8.v10.patch, 
> YARN-7102-branch-2.8.v11.patch, YARN-7102-branch-2.8.v14.patch, 
> YARN-7102-branch-2.8.v14.patch, YARN-7102-branch-2.8.v9.patch, 
> YARN-7102-branch-2.v14.patch, YARN-7102-branch-2.v14.patch, 
> YARN-7102-branch-2.v9.patch, YARN-7102-branch-2.v9.patch, 
> YARN-7102-branch-2.v9.patch, YARN-7102.v1.patch, YARN-7102.v12.patch, 
> YARN-7102.v13.patch, YARN-7102.v14.patch, YARN-7102.v15.patch, 
> YARN-7102.v2.patch, YARN-7102.v3.patch, YARN-7102.v4.patch, 
> YARN-7102.v5.patch, YARN-7102.v6.patch, YARN-7102.v7.patch, 
> YARN-7102.v8.patch, YARN-7102.v9.patch
>
>
> ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM 
> heartbeat in YARN-6640, please refer to YARN-6640 for details. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7666) Introduce scheduler specific environment variable support in ASC for better scheduling placement configurations

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310026#comment-16310026
 ] 

genericqa commented on YARN-7666:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
12s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  0s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 15 new + 244 unchanged - 0 fixed = 259 total (was 244) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 58s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
35s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
2s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 15s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}137m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAppSchedulingInfo |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7666 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904421/YARN-7666.004.patch |
| Optional Tests |  

[jira] [Commented] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310018#comment-16310018
 ] 

lujie commented on YARN-7663:
-

YARN-7663_2.patch is the patch that contains unit test, i have deleted 
YARN-7663_trunk.patch .again,sorry about generating massive messages due to 
delele old files

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7696) Add container tags to ContainerTokenIdentifier, api.Container and NMContainerStatus to handle all recovery cases

2018-01-03 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7696:
--
Description: 
The NM needs to persist the Container tags so that on RM recovery, it is sent 
back to the RM via the NMContainerStatus. The RM would then recover the 
AllocationTagsManager using this information.
The api.Container also requires the allocationTags since after AM recovery, we 
need to provide the AM with previously allocated containers.

> Add container tags to ContainerTokenIdentifier, api.Container and 
> NMContainerStatus to handle all recovery cases
> 
>
> Key: YARN-7696
> URL: https://issues.apache.org/jira/browse/YARN-7696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>
> The NM needs to persist the Container tags so that on RM recovery, it is sent 
> back to the RM via the NMContainerStatus. The RM would then recover the 
> AllocationTagsManager using this information.
> The api.Container also requires the allocationTags since after AM recovery, 
> we need to provide the AM with previously allocated containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7516) Security check for untrusted docker image

2018-01-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310005#comment-16310005
 ] 

Eric Yang commented on YARN-7516:
-

Ping, can someone review 003 patch?  Thanks

> Security check for untrusted docker image
> -
>
> Key: YARN-7516
> URL: https://issues.apache.org/jira/browse/YARN-7516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7516.001.patch, YARN-7516.002.patch, 
> YARN-7516.003.patch
>
>
> Hadoop YARN Services can support using private docker registry image or 
> docker image from docker hub.  In current implementation, Hadoop security is 
> enforced through username and group membership, and enforce uid:gid 
> consistency in docker container and distributed file system.  There is cloud 
> use case for having ability to run untrusted docker image on the same cluster 
> for testing.  
> The basic requirement for untrusted container is to ensure all kernel and 
> root privileges are dropped, and there is no interaction with distributed 
> file system to avoid contamination.  We can probably enforce detection of 
> untrusted docker image by checking the following:
> # If docker image is from public docker hub repository, the container is 
> automatically flagged as insecure, and disk volume mount are disabled 
> automatically, and drop all kernel capabilities.
> # If docker image is from private repository in docker hub, and there is a 
> white list to allow the private repository, disk volume mount is allowed, 
> kernel capabilities follows the allowed list.
> # If docker image is from private trusted registry with image name like 
> "private.registry.local:5000/centos", and white list allows this private 
> trusted repository.  Disk volume mount is allowed, kernel capabilities 
> follows the allowed list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7696) Add container tags to ContainerTokenIdentifier, api.Container and NMContainerStatus to handle all recovery cases

2018-01-03 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-7696:
-

 Summary: Add container tags to ContainerTokenIdentifier, 
api.Container and NMContainerStatus to handle all recovery cases
 Key: YARN-7696
 URL: https://issues.apache.org/jira/browse/YARN-7696
 Project: Hadoop YARN
  Issue Type: Sub-task
 Environment: The NM needs to persist the Container tags so that on RM 
recovery, it is sent back to the RM via the NMContainerStatus. The RM would 
then recover the AllocationTagsManager using this information.
The api.Container also requires the allocationTags since after AM recovery, we 
need to provide the AM with previously allocated containers.
Reporter: Arun Suresh






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7696) Add container tags to ContainerTokenIdentifier, api.Container and NMContainerStatus to handle all recovery cases

2018-01-03 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7696:
--
Environment: (was: The NM needs to persist the Container tags so that 
on RM recovery, it is sent back to the RM via the NMContainerStatus. The RM 
would then recover the AllocationTagsManager using this information.
The api.Container also requires the allocationTags since after AM recovery, we 
need to provide the AM with previously allocated containers.)

> Add container tags to ContainerTokenIdentifier, api.Container and 
> NMContainerStatus to handle all recovery cases
> 
>
> Key: YARN-7696
> URL: https://issues.apache.org/jira/browse/YARN-7696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7602) NM should reference the singleton JvmMetrics instance

2018-01-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310003#comment-16310003
 ] 

Hudson commented on YARN-7602:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13443 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13443/])
YARN-7602. NM should reference the singleton JvmMetrics instance. (haibochen: 
rev 2f6c038be6c23522ea64fc4e415910fb72493eb2)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/source/TestJvmMetrics.java


> NM should reference the singleton JvmMetrics instance
> -
>
> Key: YARN-7602
> URL: https://issues.apache.org/jira/browse/YARN-7602
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 3.1.0, 3.0.1
>
> Attachments: YARN-7602.00.patch, YARN-7602.01.patch, 
> YARN-7602.02.patch, YARN-7602.03.patch, YARN-7602.04.patch
>
>
> NM does not reference the singleton JvmMetrics instance in its 
> NodeManagerMetrics. This will easily cause NM to crash if any of the node 
> manager components tries to register JvmMetrics. An example of this is 
> TimelineCollectorManager that hosts a HBaseClient that registers JvmMetrics 
> again. See HBASE-19409 for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment

2018-01-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309995#comment-16309995
 ] 

Eric Yang commented on YARN-7677:
-

[~ebadger] I think I understand your scenario better now.  When host and docker 
environments runs two separate Hadoop clusters, we do not want the host Hadoop 
configuration to be exposed to docker because disk settings and file system 
layout do not apply.  Other scenarios, such as HDFS is outside of docker 
container, and running Spark python application in docker container to access 
host level HDFS, Hadoop configuration should be inherited from host to make 
sure well optimized timing settings are exposed.  For huge clusters, the first 
scenario maybe used to isolate virtual clusters.  

For smaller clusters, it is most likely to run mix workload and use docker to 
isolate programming libraries.  Host level node manager white list can not get 
overwritten by container.  I think both cases can be supported, and the default 
is probably inheriting {{HADOOP_CONF_DIR}} for smaller clusters to boost 
efficient utilization of system resource.  It would be better if we build a 
switch for env_reset as part of job submission flag to disable Hadoop system 
environment variable inheritance.

> HADOOP_CONF_DIR should not be automatically put in task environment
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7602) NM should reference the singleton JvmMetrics instance

2018-01-03 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309964#comment-16309964
 ] 

Haibo Chen commented on YARN-7602:
--

Checking this in.

> NM should reference the singleton JvmMetrics instance
> -
>
> Key: YARN-7602
> URL: https://issues.apache.org/jira/browse/YARN-7602
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7602.00.patch, YARN-7602.01.patch, 
> YARN-7602.02.patch, YARN-7602.03.patch, YARN-7602.04.patch
>
>
> NM does not reference the singleton JvmMetrics instance in its 
> NodeManagerMetrics. This will easily cause NM to crash if any of the node 
> manager components tries to register JvmMetrics. An example of this is 
> TimelineCollectorManager that hosts a HBaseClient that registers JvmMetrics 
> again. See HBASE-19409 for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309961#comment-16309961
 ] 

genericqa commented on YARN-7663:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 34s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 40s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}116m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7663 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904418/YARN-7663_trunk.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 045ad0ea3616 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c9bf813 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19089/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19089/testReport/ |
| Max. process+thread count | 861 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663_1.patch

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309931#comment-16309931
 ] 

lujie edited comment on YARN-7663 at 1/3/18 5:15 PM:
-

Sorry about delete old files, I reattach old patch, thanks for Jason Lowe's 
advise. I try add a unit test in previous patch , please check it


was (Author: xiaoheipangzi):
Sorry about delete old files, thanks Jason Lowe's advise. I try add a unit test 
in previous patch , please check it

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309931#comment-16309931
 ] 

lujie commented on YARN-7663:
-

Sorry about delete old files, thanks Jason Lowe's advise. I try add a unit test 
in previous patch , please check it

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_2.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663_2.patch

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_2.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment

2018-01-03 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309906#comment-16309906
 ] 

Eric Badger commented on YARN-7677:
---

[~eyang], the docker container file system layout doesn't have to be the same 
as the host. It's entirely possible for an image to have {{HADOOP_CONF_DIR}} 
set to something like {{/home/conf}}, while the host has it in {{/tmp/conf}}. 
In the current implementation, there is no way for {{HADOOP_CONF_DIR}} to be 
set correctly in this case without the user explicitly setting it with their 
job. Even if you set up {{HADOOP_CONF_DIR}} in the image as an environment 
variable, it will be overridden by the container startup script, since 
{{HADOOP_CONF_DIR}} is being passed in by default regardless. In the case where 
a user doesn't set {{HADOOP_CONF_DIR}} explicitly, it makes sense that the 
docker image will know where to set it (via ENV), while the NM will not, since 
their layouts are not necessarily the same.

> HADOOP_CONF_DIR should not be automatically put in task environment
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7590) Improve container-executor validation check

2018-01-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309902#comment-16309902
 ] 

Eric Yang commented on YARN-7590:
-

Happy New Year [~miklos.szeg...@cloudera.com], Can you review the 006 patch?  
Thank you

> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7590.001.patch, YARN-7590.002.patch, 
> YARN-7590.003.patch, YARN-7590.004.patch, YARN-7590.005.patch, 
> YARN-7590.006.patch
>
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is owned by the same user as the caller to 
> container-executor.
> # Make sure the log directory prefix is owned by the same user as the caller.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309884#comment-16309884
 ] 

Jason Lowe commented on YARN-7663:
--

Please do not remove old patches that were commented on.  In addition to 
generating a bunch of JIRA update email traffic, it makes reviews of subsequent 
patches more difficult.  There's no context to see what changed since the last 
review.


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: (was: YARN-7663_trunk.patch)

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: (was: YARN-7663_2.8.4.patch)

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7666) Introduce scheduler specific environment variable support in ASC for better scheduling placement configurations

2018-01-03 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-7666:
--
Attachment: YARN-7666.004.patch

Fixing test case issue.

> Introduce scheduler specific environment variable support in ASC for better 
> scheduling placement configurations
> ---
>
> Key: YARN-7666
> URL: https://issues.apache.org/jira/browse/YARN-7666
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-7666.001.patch, YARN-7666.002.patch, 
> YARN-7666.003.patch, YARN-7666.004.patch
>
>
> Introduce a scheduler specific key-value map to hold env variables in ASC.
> And also convert AppPlacementAllocator initialization to each app based on 
> policy configured at each app.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7681) Scheduler should double-check placement constraint before actual allocation is made

2018-01-03 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309837#comment-16309837
 ] 

Arun Suresh commented on YARN-7681:
---

[~cheersyang], now that YARN-7682 is done, maybe you want to give this a shot ?
I think something along the lines of 
[this|https://issues.apache.org/jira/browse/YARN-7613?focusedCommentId=16303343=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16303343]
 should be fine.

> Scheduler should double-check placement constraint before actual allocation 
> is made
> ---
>
> Key: YARN-7681
> URL: https://issues.apache.org/jira/browse/YARN-7681
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: RM, scheduler
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>
> This JIRA is created based on the discussions under YARN-7612, see comments 
> after [this 
> comment|https://issues.apache.org/jira/browse/YARN-7612?focusedCommentId=16303051=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16303051].
>  AllocationTagsManager maintains tags info that helps to make placement 
> decisions at placement phase, however tags are changing along with 
> container's lifecycle, so it is possible that the placement violates the 
> constraints at the scheduling phase. Propose to add an extra check in the 
> scheduler to make sure constraints are still satisfied during the actual 
> allocation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7682) Expose canSatisfyConstraints utility function to validate a placement against a constraint

2018-01-03 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7682:
--
Summary: Expose canSatisfyConstraints utility function to validate a 
placement against a constraint  (was: Expose canSatisfyConstraints utility 
function to validate a placement with a constraint)

> Expose canSatisfyConstraints utility function to validate a placement against 
> a constraint
> --
>
> Key: YARN-7682
> URL: https://issues.apache.org/jira/browse/YARN-7682
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
> Attachments: YARN-7682-YARN-6592.001.patch, 
> YARN-7682-YARN-6592.002.patch, YARN-7682-YARN-6592.003.patch, 
> YARN-7682-YARN-6592.004.patch, YARN-7682-YARN-6592.005.patch, 
> YARN-7682-YARN-6592.006.patch, YARN-7682.wip.patch
>
>
> As per discussion in YARN-7613. Lets expose {{canAssign}} method in the 
> PlacementConstraintManager that takes a sourceTags, applicationId, 
> SchedulerNode and AllocationTagsManager and returns true if constraints are 
> not violated by placing the container on the node.
> I prefer not passing in the SchedulingRequest, since it can have > 1 
> numAllocations. We want this api to be called for single allocations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7602) NM should reference the singleton JvmMetrics instance

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309827#comment-16309827
 ] 

genericqa commented on YARN-7602:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 54s{color} | {color:orange} root: The patch generated 1 new + 7 unchanged - 
0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 31s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
58s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 48s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7602 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904408/YARN-7602.04.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d2c75d5cb731 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c9bf813 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| 

[jira] [Commented] (YARN-7682) Expose canSatisfyConstraints utility function to validate a placement with a constraint

2018-01-03 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309821#comment-16309821
 ] 

Arun Suresh commented on YARN-7682:
---

+1 for the latest patch. Committing shortly

> Expose canSatisfyConstraints utility function to validate a placement with a 
> constraint
> ---
>
> Key: YARN-7682
> URL: https://issues.apache.org/jira/browse/YARN-7682
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
> Attachments: YARN-7682-YARN-6592.001.patch, 
> YARN-7682-YARN-6592.002.patch, YARN-7682-YARN-6592.003.patch, 
> YARN-7682-YARN-6592.004.patch, YARN-7682-YARN-6592.005.patch, 
> YARN-7682-YARN-6592.006.patch, YARN-7682.wip.patch
>
>
> As per discussion in YARN-7613. Lets expose {{canAssign}} method in the 
> PlacementConstraintManager that takes a sourceTags, applicationId, 
> SchedulerNode and AllocationTagsManager and returns true if constraints are 
> not violated by placing the container on the node.
> I prefer not passing in the SchedulingRequest, since it can have > 1 
> numAllocations. We want this api to be called for single allocations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7682) Expose canSatisfyConstraints utility function to validate a placement with a constraint

2018-01-03 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7682:
--
Summary: Expose canSatisfyConstraints utility function to validate a 
placement with a constraint  (was: Expose canAssign method in the 
PlacementConstraintManager)

> Expose canSatisfyConstraints utility function to validate a placement with a 
> constraint
> ---
>
> Key: YARN-7682
> URL: https://issues.apache.org/jira/browse/YARN-7682
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
> Attachments: YARN-7682-YARN-6592.001.patch, 
> YARN-7682-YARN-6592.002.patch, YARN-7682-YARN-6592.003.patch, 
> YARN-7682-YARN-6592.004.patch, YARN-7682-YARN-6592.005.patch, 
> YARN-7682-YARN-6592.006.patch, YARN-7682.wip.patch
>
>
> As per discussion in YARN-7613. Lets expose {{canAssign}} method in the 
> PlacementConstraintManager that takes a sourceTags, applicationId, 
> SchedulerNode and AllocationTagsManager and returns true if constraints are 
> not violated by placing the container on the node.
> I prefer not passing in the SchedulingRequest, since it can have > 1 
> numAllocations. We want this api to be called for single allocations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663_trunk.patch

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_2.8.4.patch, YARN-7663_trunk.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: (was: YARN-7663_1.patch)

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_2.8.4.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663_2.8.4.patch

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_2.8.4.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7682) Expose canAssign method in the PlacementConstraintManager

2018-01-03 Thread Panagiotis Garefalakis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309790#comment-16309790
 ] 

Panagiotis Garefalakis commented on YARN-7682:
--

A temporary maven dependency error made jenkins patch v004 and v005 fail.
Resolved on v006 (even though patches are identical).

> Expose canAssign method in the PlacementConstraintManager
> -
>
> Key: YARN-7682
> URL: https://issues.apache.org/jira/browse/YARN-7682
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
> Attachments: YARN-7682-YARN-6592.001.patch, 
> YARN-7682-YARN-6592.002.patch, YARN-7682-YARN-6592.003.patch, 
> YARN-7682-YARN-6592.004.patch, YARN-7682-YARN-6592.005.patch, 
> YARN-7682-YARN-6592.006.patch, YARN-7682.wip.patch
>
>
> As per discussion in YARN-7613. Lets expose {{canAssign}} method in the 
> PlacementConstraintManager that takes a sourceTags, applicationId, 
> SchedulerNode and AllocationTagsManager and returns true if constraints are 
> not violated by placing the container on the node.
> I prefer not passing in the SchedulingRequest, since it can have > 1 
> numAllocations. We want this api to be called for single allocations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7557) It should be possible to specify resource types in the fair scheduler increment value

2018-01-03 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309779#comment-16309779
 ] 

Gergo Repas commented on YARN-7557:
---

Regarding the deprecation warnings: I do not want to annotate methods using 
{{FairSchedulerConfiguration.RM_SCHEDULER_INCREMENT_ALLOCATION_VCORES}} or 
{{FairSchedulerConfiguration.RM_SCHEDULER_INCREMENT_ALLOCATION_MB}} with 
{{@SuppressWarning}} because it may mask other issues in the future, I also 
don't want to remove the {{@Deprecated}} annotations from those constants. I 
think it's best to accept these warnings.

> It should be possible to specify resource types in the fair scheduler 
> increment value
> -
>
> Key: YARN-7557
> URL: https://issues.apache.org/jira/browse/YARN-7557
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Daniel Templeton
>Assignee: Gergo Repas
>Priority: Critical
> Attachments: YARN-7557.000.patch, YARN-7557.001.patch, 
> YARN-7557.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309762#comment-16309762
 ] 

genericqa commented on YARN-7663:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 63m 
28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7663 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904403/YARN-7663_1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux dc45de5c2b24 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c9bf813 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19087/testReport/ |
| Max. process+thread count | 829 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19087/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   

[jira] [Commented] (YARN-7666) Introduce scheduler specific environment variable support in ASC for better scheduling placement configurations

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309735#comment-16309735
 ] 

genericqa commented on YARN-7666:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
6s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 57s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 16 new + 243 unchanged - 0 fixed = 259 total (was 243) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
35s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
6s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}282m 59s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}354m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimitsByPartition
 |
|   | hadoop.yarn.server.resourcemanager.TestRM |
|   | hadoop.yarn.server.resourcemanager.TestDecommissioningNodesWatcher |
|   | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | 

[jira] [Commented] (YARN-7682) Expose canAssign method in the PlacementConstraintManager

2018-01-03 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309729#comment-16309729
 ] 

genericqa commented on YARN-7682:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-6592 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
53s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} YARN-6592 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 25 new + 0 unchanged - 3 fixed = 25 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 64m 
17s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}111m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7682 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12904393/YARN-7682-YARN-6592.006.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cab593e06b0f 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | YARN-6592 / 1c5fa65 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/19086/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19086/testReport/ |
| Max. process+thread count | 799 (vs. ulimit of 5000) |
| modules | C: 

[jira] [Assigned] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-7663:


Assignee: lujie

Thanks for the report and the patch!

Ignoring the START event seems appropriate here.  Could you add a unit test of 
the new start-after-killed transition logic to TestRMAppTransitions?


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7602) NM should reference the singleton JvmMetrics instance

2018-01-03 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309692#comment-16309692
 ] 

Haibo Chen commented on YARN-7602:
--

v4 of the patch reverts making NodeManagerMetrics final.

> NM should reference the singleton JvmMetrics instance
> -
>
> Key: YARN-7602
> URL: https://issues.apache.org/jira/browse/YARN-7602
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7602.00.patch, YARN-7602.01.patch, 
> YARN-7602.02.patch, YARN-7602.03.patch, YARN-7602.04.patch
>
>
> NM does not reference the singleton JvmMetrics instance in its 
> NodeManagerMetrics. This will easily cause NM to crash if any of the node 
> manager components tries to register JvmMetrics. An example of this is 
> TimelineCollectorManager that hosts a HBaseClient that registers JvmMetrics 
> again. See HBASE-19409 for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7602) NM should reference the singleton JvmMetrics instance

2018-01-03 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-7602:
-
Attachment: YARN-7602.04.patch

> NM should reference the singleton JvmMetrics instance
> -
>
> Key: YARN-7602
> URL: https://issues.apache.org/jira/browse/YARN-7602
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7602.00.patch, YARN-7602.01.patch, 
> YARN-7602.02.patch, YARN-7602.03.patch, YARN-7602.04.patch
>
>
> NM does not reference the singleton JvmMetrics instance in its 
> NodeManagerMetrics. This will easily cause NM to crash if any of the node 
> manager components tries to register JvmMetrics. An example of this is 
> TimelineCollectorManager that hosts a HBaseClient that registers JvmMetrics 
> again. See HBASE-19409 for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7602) NM should reference the singleton JvmMetrics instance

2018-01-03 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309688#comment-16309688
 ] 

Haibo Chen commented on YARN-7602:
--

Thanks [~mdrob] [~rkanter] and [~rohithsharma] for the reviews!
Looks like the new failures with mockito-related error messages are caused by 
the change (making NodeManagerMetrics class final)
to address the checkstyle warning. 

> NM should reference the singleton JvmMetrics instance
> -
>
> Key: YARN-7602
> URL: https://issues.apache.org/jira/browse/YARN-7602
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7602.00.patch, YARN-7602.01.patch, 
> YARN-7602.02.patch, YARN-7602.03.patch
>
>
> NM does not reference the singleton JvmMetrics instance in its 
> NodeManagerMetrics. This will easily cause NM to crash if any of the node 
> manager components tries to register JvmMetrics. An example of this is 
> TimelineCollectorManager that hosts a HBaseClient that registers JvmMetrics 
> again. See HBASE-19409 for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7692) Skip validating priority acls while recovering applications

2018-01-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309627#comment-16309627
 ] 

Hudson commented on YARN-7692:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13441 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13441/])
YARN-7692. Skip validating priority acls while recovering applications. 
(rohithsharmaks: rev c9bf813c9a6c018d14f2bef49ba086ec0e60c761)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java


> Skip validating priority acls while recovering applications
> ---
>
> Key: YARN-7692
> URL: https://issues.apache.org/jira/browse/YARN-7692
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Charan Hebri
>Assignee: Sunil G
>Priority: Blocker
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: YARN-7692.001.patch
>
>
> Test scenario
> --
> 1. A cluster is created, no ACLs are included
> 2. Submit jobs with an existing user say 'user_a'
> 3. Enable ACLs and create a priority ACL entry via the property 
> yarn.scheduler.capacity.priority-acls. Do not include the user, 'user_a' in 
> this ACL.
> 4. Submit a job with the 'user_a'
> The observed behavior in this case is that the job is rejected as 'user_a' 
> does not have the permission to run the job which is expected behavior. But 
> Resource Manager also goes down when it tries to recover previous 
> applications and fails to recover them.
> Below is the exception seen,
> {noformat}
> 2017-12-27 10:52:30,064 INFO  conf.Configuration 
> (Configuration.java:getConfResourceAsInputStream(2659)) - found resource 
> yarn-site.xml at file:/etc/hadoop/3.0.0.0-636/0/yarn-site.xml
> 2017-12-27 10:52:30,065 INFO  scheduler.AbstractYarnScheduler 
> (AbstractYarnScheduler.java:setClusterMaxPriority(911)) - Updated the cluste 
> max priority to maxClusterLevelAppPriority = 10
> 2017-12-27 10:52:30,066 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:transitionToActive(1177)) - Transitioning to active 
> state
> 2017-12-27 10:52:30,097 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(765)) - Recovery started
> 2017-12-27 10:52:30,102 INFO  recovery.RMStateStore 
> (RMStateStore.java:checkVersion(747)) - Loaded RM state version info 1.5
> 2017-12-27 10:52:30,375 INFO  security.RMDelegationTokenSecretManager 
> (RMDelegationTokenSecretManager.java:recover(196)) - recovering 
> RMDelegationTokenSecretManager.
> 2017-12-27 10:52:30,380 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:recover(561)) - Recovering 51 applications
> 2017-12-27 10:52:30,432 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:recover(571)) - Successfully recovered 0 out of 51 
> applications
> 2017-12-27 10:52:30,432 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(776)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) 
> does not have permission to submit/update application_1514268754125_0001 for 0
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2348)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:358)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:567)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1390)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at 
> 

[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663_1.patch

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: (was: YARN-7663.patch)

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663.patch

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7692) Skip validating priority acls while recovering applications

2018-01-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309613#comment-16309613
 ] 

Rohith Sharma K S commented on YARN-7692:
-

committing shortly

> Skip validating priority acls while recovering applications
> ---
>
> Key: YARN-7692
> URL: https://issues.apache.org/jira/browse/YARN-7692
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Charan Hebri
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-7692.001.patch
>
>
> Test scenario
> --
> 1. A cluster is created, no ACLs are included
> 2. Submit jobs with an existing user say 'user_a'
> 3. Enable ACLs and create a priority ACL entry via the property 
> yarn.scheduler.capacity.priority-acls. Do not include the user, 'user_a' in 
> this ACL.
> 4. Submit a job with the 'user_a'
> The observed behavior in this case is that the job is rejected as 'user_a' 
> does not have the permission to run the job which is expected behavior. But 
> Resource Manager also goes down when it tries to recover previous 
> applications and fails to recover them.
> Below is the exception seen,
> {noformat}
> 2017-12-27 10:52:30,064 INFO  conf.Configuration 
> (Configuration.java:getConfResourceAsInputStream(2659)) - found resource 
> yarn-site.xml at file:/etc/hadoop/3.0.0.0-636/0/yarn-site.xml
> 2017-12-27 10:52:30,065 INFO  scheduler.AbstractYarnScheduler 
> (AbstractYarnScheduler.java:setClusterMaxPriority(911)) - Updated the cluste 
> max priority to maxClusterLevelAppPriority = 10
> 2017-12-27 10:52:30,066 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:transitionToActive(1177)) - Transitioning to active 
> state
> 2017-12-27 10:52:30,097 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(765)) - Recovery started
> 2017-12-27 10:52:30,102 INFO  recovery.RMStateStore 
> (RMStateStore.java:checkVersion(747)) - Loaded RM state version info 1.5
> 2017-12-27 10:52:30,375 INFO  security.RMDelegationTokenSecretManager 
> (RMDelegationTokenSecretManager.java:recover(196)) - recovering 
> RMDelegationTokenSecretManager.
> 2017-12-27 10:52:30,380 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:recover(561)) - Recovering 51 applications
> 2017-12-27 10:52:30,432 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:recover(571)) - Successfully recovered 0 out of 51 
> applications
> 2017-12-27 10:52:30,432 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(776)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) 
> does not have permission to submit/update application_1514268754125_0001 for 0
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2348)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:358)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:567)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1390)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1179)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
> at 
> 

[jira] [Updated] (YARN-7692) Skip validating priority acls while recovering applications

2018-01-03 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-7692:

Summary: Skip validating priority acls while recovering applications  (was: 
Resource Manager goes down when a user not included in a priority acl submits a 
job)

> Skip validating priority acls while recovering applications
> ---
>
> Key: YARN-7692
> URL: https://issues.apache.org/jira/browse/YARN-7692
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Charan Hebri
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-7692.001.patch
>
>
> Test scenario
> --
> 1. A cluster is created, no ACLs are included
> 2. Submit jobs with an existing user say 'user_a'
> 3. Enable ACLs and create a priority ACL entry via the property 
> yarn.scheduler.capacity.priority-acls. Do not include the user, 'user_a' in 
> this ACL.
> 4. Submit a job with the 'user_a'
> The observed behavior in this case is that the job is rejected as 'user_a' 
> does not have the permission to run the job which is expected behavior. But 
> Resource Manager also goes down when it tries to recover previous 
> applications and fails to recover them.
> Below is the exception seen,
> {noformat}
> 2017-12-27 10:52:30,064 INFO  conf.Configuration 
> (Configuration.java:getConfResourceAsInputStream(2659)) - found resource 
> yarn-site.xml at file:/etc/hadoop/3.0.0.0-636/0/yarn-site.xml
> 2017-12-27 10:52:30,065 INFO  scheduler.AbstractYarnScheduler 
> (AbstractYarnScheduler.java:setClusterMaxPriority(911)) - Updated the cluste 
> max priority to maxClusterLevelAppPriority = 10
> 2017-12-27 10:52:30,066 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:transitionToActive(1177)) - Transitioning to active 
> state
> 2017-12-27 10:52:30,097 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(765)) - Recovery started
> 2017-12-27 10:52:30,102 INFO  recovery.RMStateStore 
> (RMStateStore.java:checkVersion(747)) - Loaded RM state version info 1.5
> 2017-12-27 10:52:30,375 INFO  security.RMDelegationTokenSecretManager 
> (RMDelegationTokenSecretManager.java:recover(196)) - recovering 
> RMDelegationTokenSecretManager.
> 2017-12-27 10:52:30,380 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:recover(561)) - Recovering 51 applications
> 2017-12-27 10:52:30,432 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:recover(571)) - Successfully recovered 0 out of 51 
> applications
> 2017-12-27 10:52:30,432 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(776)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.security.AccessControlException: User hrt_qa (auth:SIMPLE) 
> does not have permission to submit/update application_1514268754125_0001 for 0
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2348)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:358)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:567)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1390)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1143)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1179)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> 

  1   2   >