[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications

2018-04-10 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431808#comment-16431808
 ] 

Rohith Sharma K S commented on YARN-8130:
-

This appears due to delay in dispatcher thread. Below is the scenario where 
race condition can happen 
First  -> CONTAINER_FINISHED event has come and put entity to publish into 
dispatcher.
Second -> APP_FINISHED event has come and remove the timelineclient instance 
from map. 
Third -> Event dispatcher is processing event to publish entity and finds 
that corresponding application id doesn't finds it.

I think before removing timelinecilent directly we should schedule for 
configured delay unlike 
PerNodeTimelineCollectorsAuxService#removeApplicationCollector. cc :/ 
[~haibochen] [~vrushalic]

> Race condition when container events are published for KILLED applications
> --
>
> Key: YARN-8130
> URL: https://issues.apache.org/jira/browse/YARN-8130
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Reporter: Charan Hebri
>Priority: Major
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_02']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_01. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_02. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_02* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7941) Transitive dependencies for component are not resolved

2018-04-10 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431811#comment-16431811
 ] 

Rohith Sharma K S commented on YARN-7941:
-

+lgtm, verified the patch in cluster. 
{code}
2018-04-10 12:01:12,982 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
[COMPONENT regionserver]: Dependencies satisfied, ramping up.
.
2018-04-10 12:01:42,880 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
[COMPONENT hbaseclient]: Dependencies satisfied, ramping up.
{code}

> Transitive dependencies for component are not resolved 
> ---
>
> Key: YARN-7941
> URL: https://issues.apache.org/jira/browse/YARN-7941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-7941.1.patch
>
>
> It is observed that transitive dependencies are not resolved as a result one 
> of the component is started earlier. 
> Ex : In HBase app, 
> master is independent component, 
> regionserver is depends on master.  
> hbaseclient depends on regionserver, 
> but I always see that HBaseClient is launched before regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8104) Add API to fetch node to attribute mapping

2018-04-10 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8104:
---
Attachment: YARN-8104-YARN-3409.002.patch

> Add API to fetch node to attribute mapping
> --
>
> Key: YARN-8104
> URL: https://issues.apache.org/jira/browse/YARN-8104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8104-YARN-3409.001.patch, 
> YARN-8104-YARN-3409.002.patch
>
>
> Add node/host to attribute mapping in yarn client API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8104) Add API to fetch node to attribute mapping

2018-04-10 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8104:
---
Attachment: YARN-8104-YARN-3409.003.patch

> Add API to fetch node to attribute mapping
> --
>
> Key: YARN-8104
> URL: https://issues.apache.org/jira/browse/YARN-8104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8104-YARN-3409.001.patch, 
> YARN-8104-YARN-3409.002.patch, YARN-8104-YARN-3409.003.patch
>
>
> Add node/host to attribute mapping in yarn client API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8104) Add API to fetch node to attribute mapping

2018-04-10 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431863#comment-16431863
 ] 

Bibin A Chundatt commented on YARN-8104:


Thank you [~Naganarasimha] for review

Attached rebased patch and also handling your comments.

> Add API to fetch node to attribute mapping
> --
>
> Key: YARN-8104
> URL: https://issues.apache.org/jira/browse/YARN-8104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8104-YARN-3409.001.patch, 
> YARN-8104-YARN-3409.002.patch, YARN-8104-YARN-3409.003.patch
>
>
> Add node/host to attribute mapping in yarn client API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7088) Fix application start time and add submit time to UIs

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431932#comment-16431932
 ] 

genericqa commented on YARN-7088:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 39m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  6m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 36m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 36m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 36m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 53s{color} | {color:orange} root: The patch generated 1 new + 1115 unchanged 
- 2 fixed = 1116 total (was 1117) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  9m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
1s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 18s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
19s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
6s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 
57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 
58s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 25s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 4s{color} | {color:green} The patch does not generate ASF Licens

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-04-10 Thread Chen Qingcha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: hadoop-2.7.2.port-gpu.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2-gpu-port.patch, hadoop-2.7.2-gpu.patch, 
> hadoop-2.7.2.port-gpu.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-04-10 Thread Chen Qingcha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: (was: hadoop-2.7.2.port-gpu.patch)

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2-gpu-port.patch, hadoop-2.7.2-gpu.patch, 
> hadoop-2.7.2.port-gpu.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7930) Add configuration to initialize RM with configured labels.

2018-04-10 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-7930:

Attachment: YARN-7930.005.patch

> Add configuration to initialize RM with configured labels.
> --
>
> Key: YARN-7930
> URL: https://issues.apache.org/jira/browse/YARN-7930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-7930.001.patch, YARN-7930.002.patch, 
> YARN-7930.003.patch, YARN-7930.004.patch, YARN-7930.005.patch
>
>
> At present, the only way to create labels is using admin API. Sometimes, 
> there is a requirement to start the cluster with pre-configured node labels. 
> This Jira introduces yarn configurations to start RM with predefined node 
> labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8134) Support specifying node resources in SLS

2018-04-10 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8134:

Attachment: YARN-8134.002.patch

> Support specifying node resources in SLS
> 
>
> Key: YARN-8134
> URL: https://issues.apache.org/jira/browse/YARN-8134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8134.002.patch, YARN-8134.patch
>
>
> At present, all nodes have same resources in SLS. We need to add capability 
> to add different resources to different nodes in SLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2018-04-10 Thread can_he (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432034#comment-16432034
 ] 

can_he commented on YARN-6629:
--

[~Tao Yang] Thank you so much for your rapidly reply!

One more question, does the new patch need to pass the test before apply to the 
branch.

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, 
> YARN-6629.006.patch, YARN-6629.branch-2.001.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, 
> type, node);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues

[jira] [Commented] (YARN-7930) Add configuration to initialize RM with configured labels.

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432045#comment-16432045
 ] 

genericqa commented on YARN-7930:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
13s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7930 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918343/YARN-7930.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 9e8338317668 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e87be8a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test 

[jira] [Created] (YARN-8139) Skip node hostname resolution when running SLS.

2018-04-10 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-8139:
---

 Summary: Skip node hostname resolution when running SLS.
 Key: YARN-8139
 URL: https://issues.apache.org/jira/browse/YARN-8139
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Currently depending on the time taken in resolution of hostname, metrics of SLS 
gets skewed. To avoid this, in this fix we are introducing a flag which can be 
used to disable hostname resolutions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7804) Refresh action on Grid view page should not be redirected to graph view

2018-04-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432057#comment-16432057
 ] 

Hudson commented on YARN-7804:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13952 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13952/])
YARN-7804. [UI2] Refresh action on Grid view page should not be (sunilg: rev 
7c1e77dda4cb3ba8952328d142aafcf0366b5903)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/timeline-view.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-app-attempt.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/timeline-view.hbs
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-app/attempts.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-app-attempt.hbs
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-app/attempts.hbs


> Refresh action on Grid view page should not be redirected to graph view
> ---
>
> Key: YARN-7804
> URL: https://issues.apache.org/jira/browse/YARN-7804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.0.0
>Reporter: Yesha Vora
>Assignee: Gergely Novák
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7804.001.patch
>
>
> Steps:
> 1) Go to application attempt page
> http://host:8088/ui2/#/yarn-app/application_1516734339938_0020/attempts?service=abc
> 2) click on grid view
> 3) click refresh button of the page
> Actual behavior:
> on refresh page, the page moves come back to graph view.
> Expected behavior:
> on refreshing page, the page should stay at grid view



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7825) Maintain constant horizontal application info bar for all pages

2018-04-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432068#comment-16432068
 ] 

Gergely Novák commented on YARN-7825:
-

[~yeshavora] Could you please attach a screenshot?

> Maintain constant horizontal application info bar for all pages
> ---
>
> Key: YARN-7825
> URL: https://issues.apache.org/jira/browse/YARN-7825
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Yesha Vora
>Priority: Major
>
> Steps:
> 1) enable Ats v2
> 2) Start Yarn service application ( Httpd )
> 3) Fix horizontal info bar for below pages.
>  * component page
>  * Component Instance info page 
>  * Application attempt Info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7830) If attempt has selected grid view, attempt info page should be opened with grid view

2018-04-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Novák updated YARN-7830:

Attachment: YARN-7830.001.patch

>  If attempt has selected grid view, attempt info page should be opened with 
> grid view
> -
>
> Key: YARN-7830
> URL: https://issues.apache.org/jira/browse/YARN-7830
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.0.0
>Reporter: Yesha Vora
>Assignee: Gergely Novák
>Priority: Major
> Attachments: YARN-7830.001.patch
>
>
> Steps:
> 1) Start Application and visit attempt page
> 2) click on Grid view
>  3) Click on attempt 1
>  
> Current behavior:
> This page is redirected to attempt info page. This page redirects to graph 
> view . 
>  
> Expected behavior:
> In this scenario, It should redirect to grid view.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8134) Support specifying node resources in SLS

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432086#comment-16432086
 ] 

genericqa commented on YARN-8134:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 37s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m 55s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
18s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.sls.TestSLSStreamAMSynth |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918353/YARN-8134.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e05796de391a 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7623cc5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20287/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20287/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-YARN-Build/20287/artifact/out/patch-asflicense-problems.txt
 |
| Max. process+thread count | 467 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20287/console |
| P

[jira] [Commented] (YARN-7830) If attempt has selected grid view, attempt info page should be opened with grid view

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432143#comment-16432143
 ] 

genericqa commented on YARN-7830:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
36m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7830 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918362/YARN-7830.001.patch |
| Optional Tests |  asflicense  shadedclient  |
| uname | Linux c9fd8e30ec1b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7c1e77d |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 303 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20288/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



>  If attempt has selected grid view, attempt info page should be opened with 
> grid view
> -
>
> Key: YARN-7830
> URL: https://issues.apache.org/jira/browse/YARN-7830
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.0.0
>Reporter: Yesha Vora
>Assignee: Gergely Novák
>Priority: Major
> Attachments: YARN-7830.001.patch
>
>
> Steps:
> 1) Start Application and visit attempt page
> 2) click on Grid view
>  3) Click on attempt 1
>  
> Current behavior:
> This page is redirected to attempt info page. This page redirects to graph 
> view . 
>  
> Expected behavior:
> In this scenario, It should redirect to grid view.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432214#comment-16432214
 ] 

Tao Yang commented on YARN-8127:


Thanks [~cheersyang]. Some additional details: This problem is most often 
happened in async-scheduling mode, multiple async scheduling threads may 
allocate for the same node which has reserved container and generate duplicated 
allocate-from-reserved proposals for the same reserved container. These 
duplicated proposals can be successfully applied in commit phase so that 
resources of node/queue will be decreased several times for a same container. 

We can add a check of reserved state for allocate-from-reserved proposals in 
commit phase to solve this problem.

Attached a patch with UT for review. 

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8104) Add API to fetch node to attribute mapping

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432328#comment-16432328
 ] 

genericqa commented on YARN-8104:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3409 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
23s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m  
3s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
15s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m  
0s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
20s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
YARN-3409 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
50s{color} | {color:green} YARN-3409 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 29m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 25s{color} | {color:orange} root: The patch generated 6 new + 261 unchanged 
- 0 fixed = 267 total (was 261) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
34s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
32s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api 
generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
56s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common 
generated 1 new + 4189 unchanged - 0 fixed = 4190 total (was 4189) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
50s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
14s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
14s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 
57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 27m  
3s{color} | {color:green} hado

[jira] [Updated] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior

2018-04-10 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-7984:
-
Description: 
The service records written to the registry are removed by ServiceClient on a 
destroy call, but not on a stop call. The service AM does have some code to 
clean up the registry entries when component instances are stopped, but if the 
AM is killed before it has a chance to perform the cleanup, these entries will 
be left in ZooKeeper. It would be better to clean these up in the stop call, so 
that RegistryDNS does not provide lookups for containers that don't exist.

Additional stop/destroy behavior improvements include:
* destroying a saved (not launched or started) service
* destroying a stopped service
* destroying a destroyed service
* 

  was:The service records written to the registry are removed by ServiceClient 
on a destroy call, but not on a stop call. The service AM does have some code 
to clean up the registry entries when component instances are stopped, but if 
the AM is killed before it has a chance to perform the cleanup, these entries 
will be left in ZooKeeper. It would be better to clean these up in the stop 
call, so that RegistryDNS does not provide lookups for containers that don't 
exist.


> Delete registry entries from ZK on ServiceClient stop and clean up 
> stop/destroy behavior
> 
>
> Key: YARN-7984
> URL: https://issues.apache.org/jira/browse/YARN-7984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7984.1.patch, YARN-7984.2.patch
>
>
> The service records written to the registry are removed by ServiceClient on a 
> destroy call, but not on a stop call. The service AM does have some code to 
> clean up the registry entries when component instances are stopped, but if 
> the AM is killed before it has a chance to perform the cleanup, these entries 
> will be left in ZooKeeper. It would be better to clean these up in the stop 
> call, so that RegistryDNS does not provide lookups for containers that don't 
> exist.
> Additional stop/destroy behavior improvements include:
> * destroying a saved (not launched or started) service
> * destroying a stopped service
> * destroying a destroyed service
> * 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior

2018-04-10 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-7984:
-
Description: 
The service records written to the registry are removed by ServiceClient on a 
destroy call, but not on a stop call. The service AM does have some code to 
clean up the registry entries when component instances are stopped, but if the 
AM is killed before it has a chance to perform the cleanup, these entries will 
be left in ZooKeeper. It would be better to clean these up in the stop call, so 
that RegistryDNS does not provide lookups for containers that don't exist.

Additional stop/destroy behavior improvements include fixing errors / 
unexpected behavior related to:
* destroying a saved (not launched or started) service
* destroying a stopped service
* destroying a destroyed service
* returning proper exit codes for destroy failures
* performing other client operations on saved services (fixing NPEs)

  was:
The service records written to the registry are removed by ServiceClient on a 
destroy call, but not on a stop call. The service AM does have some code to 
clean up the registry entries when component instances are stopped, but if the 
AM is killed before it has a chance to perform the cleanup, these entries will 
be left in ZooKeeper. It would be better to clean these up in the stop call, so 
that RegistryDNS does not provide lookups for containers that don't exist.

Additional stop/destroy behavior improvements include:
* destroying a saved (not launched or started) service
* destroying a stopped service
* destroying a destroyed service
* 


> Delete registry entries from ZK on ServiceClient stop and clean up 
> stop/destroy behavior
> 
>
> Key: YARN-7984
> URL: https://issues.apache.org/jira/browse/YARN-7984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7984.1.patch, YARN-7984.2.patch
>
>
> The service records written to the registry are removed by ServiceClient on a 
> destroy call, but not on a stop call. The service AM does have some code to 
> clean up the registry entries when component instances are stopped, but if 
> the AM is killed before it has a chance to perform the cleanup, these entries 
> will be left in ZooKeeper. It would be better to clean these up in the stop 
> call, so that RegistryDNS does not provide lookups for containers that don't 
> exist.
> Additional stop/destroy behavior improvements include fixing errors / 
> unexpected behavior related to:
> * destroying a saved (not launched or started) service
> * destroying a stopped service
> * destroying a destroyed service
> * returning proper exit codes for destroy failures
> * performing other client operations on saved services (fixing NPEs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8125) YARNUIV2 does not failover from standby to active endpoint like previous YARN UI

2018-04-10 Thread Phil Zampino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Zampino updated YARN-8125:
---
Description: 
If the YARN UI is accessed via the standby resource manager endpoint, it 
automatically redirects the requests to the active resource manager endpoint. 
YARNUIV2 should behave the same way.

Apache Knox 1.0.0 introduced the ability to dynamically determine proxied 
RESOURCEMANAGER and YARNUI service endpoints based on YARN configuration from 
Ambari. This functionality works for RM and YARNUI because even though the YARN 
config may reference the standby RM endpoint, requests are automatically 
redirected to the active endpoint.

RM and YARNUI requests to the standby endpoint result in a HTTP 307 response. 
YARNUIV2 should respond similarly.

If YARNUIV2 behaves differently, then Knox will not be able to support its own 
dynamic configuration behavior when proxying YARNUIV2.

KNOX-1212 adds the integration with Knox, but KNOX-1236 is blocked by this 
issue.


  was:
If the YARN UI is accessed via the standby resource manager endpoint, it 
automatically redirects the requests to the active resource manager endpoint. 
YARNUIV2 should behave the same way.

Apache Knox 1.0.0 introduced the ability to dynamically determine proxied 
RESOURCEMANAGER and YARNUI service endpoints based on YARN configuration from 
Ambari. This functionality works for RM and YARNUI because even though the YARN 
config may reference the standby RM endpoint, requests are automatically 
redirected to the active endpoint.

If YARNUIV2 behaves differently, then Knox will not be able to support its own 
dynamic configuration behavior when proxying YARNUIV2.

KNOX-1212 adds the integration with Knox, but KNOX-1236 is blocked by this 
issue.



> YARNUIV2 does not failover from standby to active endpoint like previous YARN 
> UI
> 
>
> Key: YARN-8125
> URL: https://issues.apache.org/jira/browse/YARN-8125
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Phil Zampino
>Priority: Major
>
> If the YARN UI is accessed via the standby resource manager endpoint, it 
> automatically redirects the requests to the active resource manager endpoint. 
> YARNUIV2 should behave the same way.
> Apache Knox 1.0.0 introduced the ability to dynamically determine proxied 
> RESOURCEMANAGER and YARNUI service endpoints based on YARN configuration from 
> Ambari. This functionality works for RM and YARNUI because even though the 
> YARN config may reference the standby RM endpoint, requests are automatically 
> redirected to the active endpoint.
> RM and YARNUI requests to the standby endpoint result in a HTTP 307 response. 
> YARNUIV2 should respond similarly.
> If YARNUIV2 behaves differently, then Knox will not be able to support its 
> own dynamic configuration behavior when proxying YARNUIV2.
> KNOX-1212 adds the integration with Knox, but KNOX-1236 is blocked by this 
> issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8134) Support specifying node resources in SLS

2018-04-10 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8134:

Attachment: YARN-8134.003.patch

> Support specifying node resources in SLS
> 
>
> Key: YARN-8134
> URL: https://issues.apache.org/jira/browse/YARN-8134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8134.002.patch, YARN-8134.003.patch, YARN-8134.patch
>
>
> At present, all nodes have same resources in SLS. We need to add capability 
> to add different resources to different nodes in SLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8125) YARNUIV2 does not failover from standby to active endpoint like previous YARN UI

2018-04-10 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-8125:
-

Assignee: Sunil G

> YARNUIV2 does not failover from standby to active endpoint like previous YARN 
> UI
> 
>
> Key: YARN-8125
> URL: https://issues.apache.org/jira/browse/YARN-8125
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Phil Zampino
>Assignee: Sunil G
>Priority: Major
>
> If the YARN UI is accessed via the standby resource manager endpoint, it 
> automatically redirects the requests to the active resource manager endpoint. 
> YARNUIV2 should behave the same way.
> Apache Knox 1.0.0 introduced the ability to dynamically determine proxied 
> RESOURCEMANAGER and YARNUI service endpoints based on YARN configuration from 
> Ambari. This functionality works for RM and YARNUI because even though the 
> YARN config may reference the standby RM endpoint, requests are automatically 
> redirected to the active endpoint.
> RM and YARNUI requests to the standby endpoint result in a HTTP 307 response. 
> YARNUIV2 should respond similarly.
> If YARNUIV2 behaves differently, then Knox will not be able to support its 
> own dynamic configuration behavior when proxying YARNUIV2.
> KNOX-1212 adds the integration with Knox, but KNOX-1236 is blocked by this 
> issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7221:

Attachment: YARN-7221.021.patch

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, 
> YARN-7221.021.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432393#comment-16432393
 ] 

Eric Yang commented on YARN-7221:
-

Patch 21 Remove checkstyle error, and also removed group-add for privileged 
container.

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, 
> YARN-7221.021.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8127:
---
Attachment: YARN-8127.001.patch

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8057) Inadequate information for handling catch clauses

2018-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432420#comment-16432420
 ] 

ASF GitHub Bot commented on YARN-8057:
--

GitHub user lzh3636 opened a pull request:

https://github.com/apache/hadoop/pull/362

YARN-8057 Inadequate information for handling catch clauses

The description of the problem:
https://issues.apache.org/jira/browse/YARN-8057
I added stack traces information to those two logging statements, so that 
the full exception information can be generated to the logs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lzh3636/hadoop YARN-8057

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/362.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #362


commit 002935625784226ba915e8631a0c8298d423677f
Author: lzh3636 
Date:   2018-04-10T15:03:54Z

update stack traces to show exception info




> Inadequate information for handling catch clauses
> -
>
> Key: YARN-8057
> URL: https://issues.apache.org/jira/browse/YARN-8057
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, yarn
>Affects Versions: 3.0.0
>Reporter: Zhenhao Li
>Priority: Major
>  Labels: easyfix
>
> Their are some situations that different exception types are caught, but the 
> handling of those exceptions can not show the differences of those types. 
> Here are the code snippets we found which have this problem:
> *org/apache/hadoop/yarn/client/api/impl/NMClientImpl.java*
> [https://github.com/apache/hadoop/blob/c02d2ba50db8a355ea03081c3984b2ea0c375a3f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/NMClientImpl.java]
> At Line *125* and Line *129.* We can see that two exception types are caught, 
> but the logging statements here can not show the exception type at all. It 
> may cause confusions to the person who is reading the log, the person can not 
> know what exception happened here.
>  
> Maybe adding stack trace information to these two logging statements is a 
> simple way to improve it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432433#comment-16432433
 ] 

genericqa commented on YARN-7221:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} YARN-7221 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7221 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918397/YARN-7221.021.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20289/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, 
> YARN-7221.021.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-10 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432474#comment-16432474
 ] 

Manikandan R commented on YARN-4606:


[~leftnoteasy] [~sunilg] Can you please share your views?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7159) Normalize unit of resource objects in RM and avoid to do unit conversion in critical path

2018-04-10 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432477#comment-16432477
 ] 

Manikandan R commented on YARN-7159:


[~sunilg] Can you please take this forward? Thanks.

> Normalize unit of resource objects in RM and avoid to do unit conversion in 
> critical path
> -
>
> Key: YARN-7159
> URL: https://issues.apache.org/jira/browse/YARN-7159
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-7159.001.patch, YARN-7159.002.patch, 
> YARN-7159.003.patch, YARN-7159.004.patch, YARN-7159.005.patch, 
> YARN-7159.006.patch, YARN-7159.007.patch, YARN-7159.008.patch, 
> YARN-7159.009.patch, YARN-7159.010.patch, YARN-7159.011.patch, 
> YARN-7159.012.patch, YARN-7159.013.patch, YARN-7159.015.patch, 
> YARN-7159.016.patch, YARN-7159.017.patch, YARN-7159.018.patch, 
> YARN-7159.019.patch, YARN-7159.020.patch, YARN-7159.021.patch, 
> YARN-7159.022.patch, YARN-7159.023.patch
>
>
> Currently resource conversion could happen in critical code path when 
> different unit is specified by client. This could impact performance and 
> throughput of RM a lot. We should do unit normalization when resource passed 
> to RM and avoid expensive unit conversion every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432492#comment-16432492
 ] 

Wangda Tan commented on YARN-8135:
--

[~oliverhuh...@gmail.com], 

There's no technical issues to make TF application to access HDFS. But it is 
really a overhead to use HDFS if the user doesn't have experience of Hadoop 
before [https://www.tensorflow.org/deploy/hadoop]. Just want to make this step 
easier. 

 

[~asuresh], 

Thanks for interested in this project. I'm not sure Hadoop-Submarine or 
YARN-Submarine, let's decide once I finish the design. 

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-10 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432502#comment-16432502
 ] 

Wei Yan commented on YARN-8135:
---

{quote}Think this should be renamed to YARN-Submarine though.

I'm not sure Hadoop-Submarine or YARN-Submarine, let's decide once I finish the 
design. 
{quote}
Hadoop-Submarine may be better here, as the project may not just only involve 
with YARN. Also, Hadoop-Submarine may be more attractive than YARN-Submarine.

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8134) Support specifying node resources in SLS

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432542#comment-16432542
 ] 

genericqa commented on YARN-8134:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
32s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918396/YARN-8134.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 16f633d2cbd4 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cef8eb7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20291/testReport/ |
| Max. process+thread count | 456 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20291/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Support specifying node resource

[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2018-04-10 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432589#comment-16432589
 ] 

Shane Kumpf commented on YARN-2674:
---

[~chenchun] - Thanks for the patch here. We are seeing this when testing the 
Docker runtime and it results in extra Docker containers being launched on RM 
restart, which is problematic. I've validated that the logic in this patch 
resolves that issue. Any chance you'd be able to update the patch? If you don't 
have the time, I could put up a patch based on your previous patch.

> Distributed shell AM may re-launch containers if RM work preserving restart 
> happens
> ---
>
> Key: YARN-2674
> URL: https://issues.apache.org/jira/browse/YARN-2674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Chun Chen
>Assignee: Chun Chen
>Priority: Major
>  Labels: oct16-easy
> Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, 
> YARN-2674.4.patch, YARN-2674.5.patch
>
>
> Currently, if RM work preserving restart happens while distributed shell is 
> running, distribute shell AM may re-launch all the containers, including 
> new/running/complete. We must make sure it won't re-launch the 
> running/complete containers.
> We need to remove allocated containers from 
> AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-04-10 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-7598:

Attachment: YARN-7598.4.patch

> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, 
> YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-04-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432601#comment-16432601
 ] 

Xuan Gong commented on YARN-7598:
-

Thanks for the review. [~djp]

Uploaded a new patch to address all your comments.

> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, 
> YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432629#comment-16432629
 ] 

genericqa commented on YARN-8127:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 24s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 23 unchanged - 0 fixed = 24 total (was 23) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 26s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 26s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8127 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918404/YARN-8127.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 281e86cf911b 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cef8eb7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20290/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.

[jira] [Commented] (YARN-7931) [atsv2 read acls] Include domain table creation as part of schema creator

2018-04-10 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432688#comment-16432688
 ] 

Vrushali C commented on YARN-7931:
--

Hi [~haibochen]

That's a good question. Let me check what it does, will see if I can add some 
unit test to determine the behavior expectation. I will update the jira / patch 
shortly.

thanks
Vrushali


> [atsv2 read acls] Include domain table creation as part of schema creator
> -
>
> Key: YARN-7931
> URL: https://issues.apache.org/jira/browse/YARN-7931
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Vrushali C
>Priority: Major
> Attachments: YARN-7391.0001.patch, YARN-7391.0002.patch, 
> YARN-7391.0003.patch
>
>
>  
> Update the schema creator to create a domain table to store timeline entity 
> domain info. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8073) TimelineClientImpl doesn't honor yarn.timeline-service.versions configuration

2018-04-10 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432710#comment-16432710
 ] 

Vrushali C commented on YARN-8073:
--

Hi [~rohithsharma]

Yes, please go ahead with cherry pick, the branch-2 jenkins issue is unrelated 
and as long as our local machine compilation & unit tests work, I think we can 
commit it.

thanks
Vrushali

> TimelineClientImpl doesn't honor yarn.timeline-service.versions configuration
> -
>
> Key: YARN-8073
> URL: https://issues.apache.org/jira/browse/YARN-8073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8073-branch-2.03.patch, YARN-8073.01.patch, 
> YARN-8073.02.patch, YARN-8073.03.patch
>
>
> Post YARN-6736, RM support writing into ats v1 and v2 by new configuration 
> setting _yarn.timeline-service.versions_. 
>  Couple of issues observed in deployment are
>  # TimelineClientImpl doesn't honor newly added configuration rather it still 
> get version number from _yarn.timeline-service.version_. This causes not 
> writing into v1.5 API's even though _yarn.timeline-service.versions has 1.5 
> value._ 
>  # Similar line from 1st point, TimelineUtils#timelineServiceV1_5Enabled 
> doesn't honor timeline-service.versions.
>  # JobHistoryEventHandler#serviceInit(), line no 271 check for version number 
> rather than calling YarnConfiguration#timelineServiceV2Enabled
> cc :/ [~agresch]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7825) Maintain constant horizontal application info bar for all pages

2018-04-10 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-7825:
-
Attachment: Screen Shot 2018-04-10 at 11.06.40 AM.png
Screen Shot 2018-04-10 at 11.07.29 AM.png
Screen Shot 2018-04-10 at 11.07.07 AM.png
Screen Shot 2018-04-10 at 11.06.27 AM.png
Screen Shot 2018-04-10 at 11.15.27 AM.png

> Maintain constant horizontal application info bar for all pages
> ---
>
> Key: YARN-7825
> URL: https://issues.apache.org/jira/browse/YARN-7825
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-04-10 at 11.06.27 AM.png, Screen Shot 
> 2018-04-10 at 11.06.40 AM.png, Screen Shot 2018-04-10 at 11.07.07 AM.png, 
> Screen Shot 2018-04-10 at 11.07.29 AM.png, Screen Shot 2018-04-10 at 11.15.27 
> AM.png
>
>
> Steps:
> 1) enable Ats v2
> 2) Start Yarn service application ( Httpd )
> 3) Fix horizontal info bar for below pages.
>  * component page
>  * Component Instance info page 
>  * Application attempt Info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432732#comment-16432732
 ] 

genericqa commented on YARN-7598:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
36m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 19 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7598 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918413/YARN-7598.4.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 80cfdff66c5a 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cef8eb7 |
| maven | version: Apache Maven 3.3.9 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/20292/artifact/out/whitespace-tabs.txt
 |
| Max. process+thread count | 341 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20292/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, 
> YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications

2018-04-10 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432723#comment-16432723
 ] 

Vrushali C commented on YARN-8130:
--

Yes, I agree, we need a configurable delay like the collectorLingerPeriod in 
the PerNodeTimelineCollectorsAuxService#removeApplicationCollector. 

Need to check if there are other places where we are removing the app id from 
some map. 

Relevant jiras for collectorLingerPeriod YARN-3995 and YARN-7835

> Race condition when container events are published for KILLED applications
> --
>
> Key: YARN-8130
> URL: https://issues.apache.org/jira/browse/YARN-8130
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Reporter: Charan Hebri
>Priority: Major
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_02']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_01. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_02. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_02* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration

2018-04-10 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432900#comment-16432900
 ] 

Jonathan Hung commented on YARN-7974:
-

Hi [~wangda] - this is the tracking url change I mentioned during last week's 
meeting. Would appreciate if you could take a look if you have the chance :)

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol.
> We'll post the patch soon, assuming there are no issues with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432922#comment-16432922
 ] 

Wangda Tan commented on YARN-7974:
--

[~jhung],

Thanks for working on the feature, I can see it's values. 

For implementation / API:

1) Have you considered only allowing AM to update the tracking URL? Which can 
solve some problems like: a. Need to properly check ACL to make the change. b. 
concurrent write tracking URL causes issue.

2) I think the updated tracking URL need to be persisted as well, otherwise RM 
restart causes update information cleared.

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol.
> We'll post the patch soon, assuming there are no issues with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432934#comment-16432934
 ] 

Wangda Tan commented on YARN-7530:
--

[~eyang], thanks for sharing ur thoughts.

To me, for currently scope of native service, it is already beyond a single / 
self-contained app on YARN:

1) YARN Service API is part of RM. 

2) After YARN-8048, system services can be deployed before running any other 
applications.

I think we should move API / Client code to proper places to avoid load native 
service client / API logics by using reflection.

This doesn't block anything for now, but I think it will be important to clean 
it up to get more contributions from community.

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: yarn-native-services
>
> Attachments: YARN-7530.001.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7221:

Attachment: YARN-7221.022.patch

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, 
> YARN-7221.021.patch, YARN-7221.022.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432936#comment-16432936
 ] 

Eric Yang commented on YARN-7221:
-

Patch 22 rebased to current trunk.

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, 
> YARN-7221.021.patch, YARN-7221.022.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432938#comment-16432938
 ] 

Wangda Tan commented on YARN-8116:
--

+1, thanks [~csingh], will commit shortly.

> Nodemanager fails with NumberFormatException: For input string: ""
> --
>
> Key: YARN-8116
> URL: https://issues.apache.org/jira/browse/YARN-8116
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8116.001.patch, YARN-8116.002.patch
>
>
> Steps followed.
> 1) Update nodemanager debug delay config
> {code}
> 
>   yarn.nodemanager.delete.debug-delay-sec
>   350
> {code}
> 2) Launch distributed shell application multiple times
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
> hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
> -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> hadoop-yarn-applications-distributedshell-*.jar{code}
> 3) restart NM
> Nodemanager fails to start with below error.
> {code}
> {code:title=NM log}
> 2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: 
> true
> 2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
> as 3600. The logs will be aggregated every 3600 seconds
> 2018-03-23 21:32:14,455 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
> 2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(148)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2018-03-23 21:32:14,460 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
> INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   

[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432970#comment-16432970
 ] 

Wangda Tan commented on YARN-7494:
--

Thanks [~sunilg], 

In general change looks good. Could u check UT failures?

[~cheersyang] please commit the patch once you think it is ready.

> Add muti node lookup support for better placement
> -
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.v0.patch, YARN-7494.v1.patch, 
> multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8061) An application may preempt itself in case of minshare preemption

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-8061:
--

Assignee: (was: Yufei Gu)

> An application may preempt itself in case of minshare preemption
> 
>
> Key: YARN-8061
> URL: https://issues.apache.org/jira/browse/YARN-8061
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Yufei Gu
>Priority: Major
>
> Assume a leaf queue A's minshare is 10G memory and fairshare is 12G. It used 
> 4G, so its minshare-staved resources is 6G and will be distributed to all its 
> apps. Assume there are 4 apps a1, a2, a3, a4 inside, who demand 3G, 2G, 1G, 
> and 0.5G. a1 gets 3G minshare-starved resources, a2 gets 2G, a3 get 1G, they 
> are all considered as starved apps except a4 who doesn't get any. 
> An app can preempt another under the same queue due to minshare starvation. 
> For example, a1 can preempt a4 if a4 uses more resources than its fair share, 
> which is 3G(12G/4). If a1 itself used more than 3G memory, it will preempt 
> itself! I will create a unit test later. 
> The solution would check application's fair share while distributing minshare 
> starvation, more details in method 
> {{FSLeafQueue#updateStarvedAppsMinshare()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6527) Provide a better out-of-the-box experience for SLS

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6527:
--

Assignee: (was: Yufei Gu)

> Provide a better out-of-the-box experience for SLS
> --
>
> Key: YARN-6527
> URL: https://issues.apache.org/jira/browse/YARN-6527
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0-alpha4
>Reporter: Robert Kanter
>Priority: Major
>
> The example provided with SLS appears to be broken - I didn't see any jobs 
> running.  On top of that, it seems like getting SLS to run properly requires 
> a lot of hadoop site configs, scheduler configs, etc.  I was only able to get 
> something running after [~yufeigu] provided a lot of config files.
> We should provide a better out-of-the-box experience for SLS.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7263) Check host name resolution performance when resource manager starts up

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-7263:
--

Assignee: (was: Yufei Gu)

> Check host name resolution performance when resource manager starts up
> --
>
> Key: YARN-7263
> URL: https://issues.apache.org/jira/browse/YARN-7263
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Priority: Major
>
> According to YARN-7207, host name resolution could be slow in some 
> environment, which affects RM performance in different ways. It would be nice 
> to check that when RM starts up and place a warning message into the logs if 
> the performance is not ideal. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7968) Reset the queue name in submission context while recovering an application

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu resolved YARN-7968.

Resolution: Won't Fix

> Reset the queue name in submission context while recovering an application
> --
>
> Key: YARN-7968
> URL: https://issues.apache.org/jira/browse/YARN-7968
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Major
>
> After YARN-7139, the new application can get correct queue name in its 
> submission context. We need to do the same thing for application recovering. 
> {code}
>   if (isAppRecovering) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug(applicationId
>   + " is recovering. Skip notifying APP_ACCEPTED");
> }
>   } else {
> // During tests we do not always have an application object, handle
> // it here but we probably should fix the tests
> if (rmApp != null && rmApp.getApplicationSubmissionContext() != null) 
> {
>   // Before we send out the event that the app is accepted is
>   // to set the queue in the submissionContext (needed on restore etc)
>   rmApp.getApplicationSubmissionContext().setQueue(queue.getName());
> }
> rmContext.getDispatcher().getEventHandler().handle(
> new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED));
>   }
> {code}
> We can do it by moving the 
> {{rmApp.getApplicationSubmissionContext().setQueue}} block out of the if-else 
> block. cc [~wilfreds].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7948) Enable refreshing maximum allocation for multiple resource types

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-7948:
--

Assignee: Szilard Nemeth  (was: Yufei Gu)

> Enable refreshing maximum allocation for multiple resource types
> 
>
> Key: YARN-7948
> URL: https://issues.apache.org/jira/browse/YARN-7948
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0
>Reporter: Yufei Gu
>Assignee: Szilard Nemeth
>Priority: Major
>
> YARN-7738 did the same thing for CS. We need a fix for FS. We could fix it by 
> moving the refresh code from class CS to class AbstractYARNScheduler. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7347) Fixe the bug in Fair scheduler to handle a queue named "root.root"

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-7347:
--

Assignee: Gergo Repas  (was: Yufei Gu)

> Fixe the bug in Fair scheduler to handle a queue named "root.root"
> --
>
> Key: YARN-7347
> URL: https://issues.apache.org/jira/browse/YARN-7347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Reporter: Yufei Gu
>Assignee: Gergo Repas
>Priority: Major
>
> A queue named "root.root" may cause issue in Fair scheduler. For example, if 
> we set the queue(root.root) to be reservable, then submit a job into the 
> queue. We got following error.
> {code}
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1508176133973_0002 to YARN : root.root is not a leaf 
> queue
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:339)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:253)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:307)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1508176133973_0002 to YARN : root.root is not a leaf queue
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:293)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:298)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:324)
>   ... 25 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6324) The log4j.properties in sample-conf doesn't work well for SLS

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6324:
--

Assignee: (was: Yufei Gu)

> The log4j.properties in sample-conf doesn't work well for SLS
> -
>
> Key: YARN-6324
> URL: https://issues.apache.org/jira/browse/YARN-6324
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Yufei Gu
>Priority: Major
>
> Many log messages are missing, such as no way to find RM logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5824) Verify app starvation under custom preemption thresholds and timeouts

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-5824:
--

Assignee: (was: Yufei Gu)

> Verify app starvation under custom preemption thresholds and timeouts
> -
>
> Key: YARN-5824
> URL: https://issues.apache.org/jira/browse/YARN-5824
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Priority: Major
>
> YARN-5783 adds basic tests to verify applications are identified to be 
> starved. This JIRA is to add more advanced tests for different values of 
> preemption thresholds and timeouts. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-3890) FairScheduler should show the scheduler health metrics similar to ones added in CapacityScheduler

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-3890:
--

Assignee: Gergo Repas  (was: Yufei Gu)

> FairScheduler should show the scheduler health metrics similar to ones added 
> in CapacityScheduler
> -
>
> Key: YARN-3890
> URL: https://issues.apache.org/jira/browse/YARN-3890
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Gergo Repas
>Priority: Major
>
> We should add information displayed in YARN-3293 in FairScheduler as well 
> possibly sharing the implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-3797) NodeManager not blacklisting the disk (shuffle) with errors

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-3797:
--

Assignee: (was: Yufei Gu)

> NodeManager not blacklisting the disk (shuffle) with errors
> ---
>
> Key: YARN-3797
> URL: https://issues.apache.org/jira/browse/YARN-3797
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Rajesh Balamohan
>Priority: Major
>
> In a multi-node environment, one of the disk (where map outputs are written) 
> in a node went bad. Errors are given below.
> {noformat}
> Info fld=0x9ad090a
> sd 6:0:5:0: [sdf]  Add. Sense: Unrecovered read error
> sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 ad 09 08 00 00 08 00
> end_request: critical medium error, dev sdf, sector 162334984
> mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
> sd 6:0:5:0: [sdf]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 6:0:5:0: [sdf]  Sense Key : Medium Error [current]
> Info fld=0x9af8892
> sd 6:0:5:0: [sdf]  Add. Sense: Unrecovered read error
> sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 af 88 90 00 00 08 00
> end_request: critical medium error, dev sdf, sector 162498704
> mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
> mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
> sd 6:0:5:0: [sdf]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 6:0:5:0: [sdf]  Sense Key : Medium Error [current]
> Info fld=0x9af8892
> sd 6:0:5:0: [sdf]  Add. Sense: Unrecovered read error
> sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 af 88 90 00 00 08 00
> end_request: critical medium error, dev sdf, sector 162498704
> {noformat}
> Diskchecker would pass as the system allows to create directories and delete 
> directories without issues.  But data being served out can be corrupt and 
> fetchers fail during CRC verification with unwanted delays and retries. 
> Ideally node manager should detect such errors and blacklist/remove those 
> disks from NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6941) Allow Queue placement policies to be ordered by attribute

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6941:
--

Assignee: (was: Yufei Gu)

> Allow Queue placement policies to be ordered by attribute
> -
>
> Key: YARN-6941
> URL: https://issues.apache.org/jira/browse/YARN-6941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Yufei Gu
>Priority: Minor
>
> It would be nice to add a feature that would allow users to provide an 
> "order" or "index" the placement policies should apply, rather than just the 
> native policy order as included in the XML.
> For instance, the following two examples would be the same:
> Natural order:
> 
> 
> 
> 
> 
> Indexed Order:
> 
> 
> 
> 
> 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6971) Clean up different ways to create resources

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6971:
--

Assignee: (was: Yufei Gu)

> Clean up different ways to create resources
> ---
>
> Key: YARN-6971
> URL: https://issues.apache.org/jira/browse/YARN-6971
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Yufei Gu
>Priority: Minor
>  Labels: newbie
>
> There are several ways to create a {{resource}} object, e.g., 
> BuilderUtils.newResource() and Resources.createResource(). These methods not 
> only cause confusing but also performance issues, for example 
> BuilderUtils.newResource() is significant slow than 
> Resources.createResource(). 
> We could merge them some how, and replace most BuilderUtils.newResource() 
> with Resources.createResource().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6925) FSSchedulerNode could be simplified extracting preemption fields into a class

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6925:
--

Assignee: (was: Yufei Gu)

> FSSchedulerNode could be simplified extracting preemption fields into a class
> -
>
> Key: YARN-6925
> URL: https://issues.apache.org/jira/browse/YARN-6925
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Miklos Szegedi
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433038#comment-16433038
 ] 

Eric Yang commented on YARN-7984:
-

[~billie.rinaldi] +1 to commit. Patch 2 looks good to me.  Stop and destroy 
command works better with the new error handling.

> Delete registry entries from ZK on ServiceClient stop and clean up 
> stop/destroy behavior
> 
>
> Key: YARN-7984
> URL: https://issues.apache.org/jira/browse/YARN-7984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7984.1.patch, YARN-7984.2.patch
>
>
> The service records written to the registry are removed by ServiceClient on a 
> destroy call, but not on a stop call. The service AM does have some code to 
> clean up the registry entries when component instances are stopped, but if 
> the AM is killed before it has a chance to perform the cleanup, these entries 
> will be left in ZooKeeper. It would be better to clean these up in the stop 
> call, so that RegistryDNS does not provide lookups for containers that don't 
> exist.
> Additional stop/destroy behavior improvements include fixing errors / 
> unexpected behavior related to:
> * destroying a saved (not launched or started) service
> * destroying a stopped service
> * destroying a destroyed service
> * returning proper exit codes for destroy failures
> * performing other client operations on saved services (fixing NPEs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8140:


 Summary: Improve log message when launch cmd is ran for stopped 
yarn service
 Key: YARN-8140
 URL: https://issues.apache.org/jira/browse/YARN-8140
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Yesha Vora


Steps:

 1) Launch sleeper app

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms

18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
application_1523387473707_0007

Exit Code: 0\{code}

2) Stop the application

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
sleeper2-duplicate-app-stopped

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms

18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
sleeper2-duplicate-app-stopped

Exit Code: 0\{code}

3) Launch the application with same name

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms

18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already 
exists: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json

Exit Code: 56

{code}

 

Here, launch cmd fails with "Service Instance dir already exists: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".

 

The log message should be more meaningful. It should return that 
"sleeper2-duplicate-app-stopped is in stopped state".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8140:
-
Description: 
Steps:

 1) Launch sleeper app

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms

18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
application_1523387473707_0007

Exit Code: 0{code}

2) Stop the application

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
sleeper2-duplicate-app-stopped

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms

18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
sleeper2-duplicate-app-stopped

Exit Code: 0{code}

3) Launch the application with same name

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms

18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already 
exists: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json

Exit Code: 56
{code}

 

Here, launch cmd fails with "Service Instance dir already exists: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".

 

The log message should be more meaningful. It should return that 
"sleeper2-duplicate-app-stopped is in stopped state".

  was:
Steps:

 1) Launch sleeper app

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04

[jira] [Commented] (YARN-8037) CGroupsResourceCalculator logs excessive warnings on container relaunch

2018-04-10 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433049#comment-16433049
 ] 

Shane Kumpf commented on YARN-8037:
---

Thanks [~miklos.szeg...@cloudera.com] - For most applications I only see a 
single exception for each of the subsystems, like the output above, so I'm not 
sure that will address a bulk of these. I have a few ideas to test out and I'll 
report back soon with more detail.

> CGroupsResourceCalculator logs excessive warnings on container relaunch
> ---
>
> Key: YARN-8037
> URL: https://issues.apache.org/jira/browse/YARN-8037
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Major
>
> When a container is relaunched, the old process no longer exists. When using 
> the {{CGroupsResourceCalculator}} this results in the warning and exception 
> below being logged every second until the relaunch occurs, which is excessive 
> and filling up the logs.
> {code:java}
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse 12844
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_02/cpuacct.stat
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse cgroups 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.memsw.usage_in_bytes
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.usage_in_bytes
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more{code}
> We should consider moving the exception to debug to reduce the noise at a 
> minimum. Alternatively, it may make sense to stop the existing 
> {{MonitoringThread}} during relaunch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-04-10 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8141:


 Summary: YARN Native Service: Respect 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
 Key: YARN-8141
 URL: https://issues.apache.org/jira/browse/YARN-8141
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Wangda Tan


Existing YARN native service overwrites 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
specified this in service spec or not. It is important to allow user to mount 
local folders like /etc/passwd, etc.

Following logic overwrites the 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
{code:java}
StringBuilder sb = new StringBuilder();
for (Entry mount : mountPaths.entrySet()) {
  if (sb.length() > 0) {
sb.append(",");
  }
  sb.append(mount.getKey());
  sb.append(":");
  sb.append(mount.getValue());
}
env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
sb.toString());{code}
Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior

2018-04-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433062#comment-16433062
 ] 

Hudson commented on YARN-7984:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13959 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13959/])
YARN-7984. Improved YARN service stop/destroy and clean up.(eyang: 
rev d553799030a5a64df328319aceb35734d0b2de20)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/test/java/org/apache/hadoop/yarn/service/ServiceClientTest.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestYarnNativeServices.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/client/ServiceClient.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/main/java/org/apache/hadoop/yarn/service/webapp/ApiServer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/test/java/org/apache/hadoop/yarn/service/TestApiServer.java


> Delete registry entries from ZK on ServiceClient stop and clean up 
> stop/destroy behavior
> 
>
> Key: YARN-7984
> URL: https://issues.apache.org/jira/browse/YARN-7984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7984.1.patch, YARN-7984.2.patch
>
>
> The service records written to the registry are removed by ServiceClient on a 
> destroy call, but not on a stop call. The service AM does have some code to 
> clean up the registry entries when component instances are stopped, but if 
> the AM is killed before it has a chance to perform the cleanup, these entries 
> will be left in ZooKeeper. It would be better to clean these up in the stop 
> call, so that RegistryDNS does not provide lookups for containers that don't 
> exist.
> Additional stop/destroy behavior improvements include fixing errors / 
> unexpected behavior related to:
> * destroying a saved (not launched or started) service
> * destroying a stopped service
> * destroying a destroyed service
> * returning proper exit codes for destroy failures
> * performing other client operations on saved services (fixing NPEs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration

2018-04-10 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433067#comment-16433067
 ] 

Jonathan Hung commented on YARN-7974:
-

Thanks for the comments,

For 1 I think this is possible, initially I didn't want to add more APIs to 
ApplicationMasterProtocol, but I think we can add a field in 
AllocateRequestProto, and just update the url on next call to allocate() to 
avoid overcomplicating the protocol.

For 2 I will make this change.

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol.
> We'll post the patch soon, assuming there are no issues with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433071#comment-16433071
 ] 

Eric Yang commented on YARN-7530:
-

[~leftnoteasy] YARN service have dependency setup backward because precursor 
SLIDER was designed to run in YARN.  Instead of server dependent on client.  
ServiceClient depends on hadoop-yarn-service-core, and 
hadoop-yarn-server-common.  Therefore, it might be problematic to move 
hadoop-yarn-services-core to yarn common.  You are welcome to try, but it would 
be good to keep some of Yarn Service Application Master as a piece that is 
build after yarn client + yarn servers to avoid circular dependencies.

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: yarn-native-services
>
> Attachments: YARN-7530.001.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433080#comment-16433080
 ] 

genericqa commented on YARN-7221:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
1s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 53s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7221 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918457/YARN-7221.022.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 1152cddcffed 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8ab776d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20293/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20293/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 1) 

[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-04-10 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433081#comment-16433081
 ] 

Shane Kumpf commented on YARN-8141:
---

Thanks for reporting this, [~leftnoteasy] - 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} is intended to be used for the purpose 
you call out. When that variable was added, 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} was retained due to its 
existing use in native services. Is there a case where 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} won't work for your need? Maybe it is 
time we do look to consolidate these two.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Priority: Critical
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-04-10 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433081#comment-16433081
 ] 

Shane Kumpf edited comment on YARN-8141 at 4/10/18 10:19 PM:
-

Thanks for reporting this, [~leftnoteasy] - 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} is intended to be used for the purpose 
you call out. When that variable was added, 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} was retained due to its 
existing use in native services. Is there a case where 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} won't work for your need? Maybe it is 
time we look to consolidate these two.


was (Author: shaneku...@gmail.com):
Thanks for reporting this, [~leftnoteasy] - 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} is intended to be used for the purpose 
you call out. When that variable was added, 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} was retained due to its 
existing use in native services. Is there a case where 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} won't work for your need? Maybe it is 
time we do look to consolidate these two.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Priority: Critical
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433083#comment-16433083
 ] 

Eric Yang commented on YARN-8140:
-

[~yeshavora] This is because the application has not been destroyed.  The 
subsequence launch failed because the duplicate service name is in use.  I 
think the message is not wrong to indicate to the user that hadoop can't deploy 
with the same name.  If the error message is returned with 
sleeper2-duplicate-app-stopped is in stopped state.  User may be mistaken that 
second service of the same name is persisted in Hadoop.  This is not the case, 
therefore, existing message is more concise in delivering the message.  We can 
change the message to "Service name sleeper2-duplicate-app-stopped is already 
taken: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".
  Will this work?

> Improve log message when launch cmd is ran for stopped yarn service
> ---
>
> Key: YARN-8140
> URL: https://issues.apache.org/jira/browse/YARN-8140
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Priority: Major
>
> Steps:
>  1) Launch sleeper app
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms
> 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
> application_1523387473707_0007
> Exit Code: 0{code}
> 2) Stop the application
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
> sleeper2-duplicate-app-stopped
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms
> 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
> sleeper2-duplicate-app-stopped
> Exit Code: 0{code}
> 3) Launch the application with same name
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms
> 18/04/10 21:31

[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-04-10 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433090#comment-16433090
 ] 

Eric Payne commented on YARN-4781:
--

bq. Hence in case we use sizeBasedWeight, we are considering pending as well. 
So i had this doubt..
I see what you mean. Good catch. I was not considering the {{sizeBasedWeight}} 
case. My first thought was to just use the 
{{FairOrderingPolicy#FairComparator}}, but that is for {{SchedulableEntity}}s 
like {{FiCaschedulingApp}}, and the {{PriorityQueue}}s in 
{{FifoIntraQueuePreemptionPlugin}} are sorting {{TempAppPerPartition}}s, so I 
wouldn't be able to combine this feature with the 
{{FifoIntraQueuePreemptionPlugin}}.

It may be worthwhile to go back to your previous suggestion about splitting out 
common functionality into an abstract {{AbstractIntraQueuePreemptionPlugin}} 
class and sub-classing FiFo and Fair puligins.

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Yesha Vora (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433093#comment-16433093
 ] 

Yesha Vora commented on YARN-8140:
--

yes [~eyang] this message sounds good.

> Improve log message when launch cmd is ran for stopped yarn service
> ---
>
> Key: YARN-8140
> URL: https://issues.apache.org/jira/browse/YARN-8140
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Priority: Major
>
> Steps:
>  1) Launch sleeper app
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms
> 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
> application_1523387473707_0007
> Exit Code: 0{code}
> 2) Stop the application
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
> sleeper2-duplicate-app-stopped
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms
> 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
> sleeper2-duplicate-app-stopped
> Exit Code: 0{code}
> 3) Launch the application with same name
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms
> 18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already 
> exists: 
> hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json
> Exit Code: 56
> {code}
>  
> Here, launch cmd fails with "Service Instance dir already exists: 
> hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".
>  
> The log message should be more meaningful. It should return that 
> "sleeper2-duplicate-app-stopped is in stopped state".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...

[jira] [Updated] (YARN-8142) yarn service application stops when AM is killed

2018-04-10 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8142:
-
Description: 
Steps:

1) Launch sleeper job ( non-docker yarn service)

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
fault-test-am-sleeper 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms

18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
application_1522887500374_0010

Exit Code: 0{code}

2) Wait for sleeper component to be up

3) Kill AM process PID

 

Expected behavior:

New attempt of AM will be started. The pre-existing container will keep running

 

Actual behavior:

Application finishes with State : FINISHED and Final-State : ENDED

New attempt was never launched

Note: 

when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the 
entire app down instead of letting it continue to run for another attempt

 

  was:
Steps:

1) Launch sleeper job ( non-docker yarn service)

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
fault-test-am-sleeper 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms

18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
application_1522887500374_0010

Exit Code: 0\{code}

2) Wait for sleeper component to be up

3) Kill AM process PID

 

Expected behavior:

New attempt of AM will be started. The pre-existing container will keep running

 

Actual behavior:

Application finishes with State : FINISHED and Final-State : ENDED

New attempt was never launched

Note: 

when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the 
entire app down instead of letting it continue to run for another attempt

 


> yarn service application stops when AM is killed
> 
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> s

[jira] [Created] (YARN-8142) yarn service application stops when AM is killed

2018-04-10 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8142:


 Summary: yarn service application stops when AM is killed
 Key: YARN-8142
 URL: https://issues.apache.org/jira/browse/YARN-8142
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


Steps:

1) Launch sleeper job ( non-docker yarn service)

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
fault-test-am-sleeper 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms

18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
application_1522887500374_0010

Exit Code: 0\{code}

2) Wait for sleeper component to be up

3) Kill AM process PID

 

Expected behavior:

New attempt of AM will be started. The pre-existing container will keep running

 

Actual behavior:

Application finishes with State : FINISHED and Final-State : ENDED

New attempt was never launched

Note: 

when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the 
entire app down instead of letting it continue to run for another attempt

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8142:
-
Summary: yarn service application stops when AM is killed with SIGTERM  
(was: yarn service application stops when AM is killed)

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7189) Container-executor doesn't remove Docker containers that error out early

2018-04-10 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7189:
--
Attachment: YARN-7189-b3.0.001.patch

> Container-executor doesn't remove Docker containers that error out early
> 
>
> Key: YARN-7189
> URL: https://issues.apache.org/jira/browse/YARN-7189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.9.0, 2.8.3, 3.0.1
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7189-b3.0.001.patch
>
>
> Once the docker run command is executed, the docker container is created 
> unless the return code is 125 meaning that the run command itself failed 
> (https://docs.docker.com/engine/reference/run/#exit-status). Any error that 
> happens after the docker run needs to remove the container during cleanup.
> {noformat:title=container-executor.c:launch_docker_container_as_user}
>   snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, 
> docker_command);
>   fprintf(LOGFILE, "Launching docker container...\n");
>   FILE* start_docker = popen(docker_command_with_binary, "r");
> {noformat}
> This is fixed by YARN-5366, which changes how we remove containers. However, 
> that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early

2018-04-10 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433137#comment-16433137
 ] 

Eric Badger commented on YARN-7189:
---

Attaching first patch to fix this issue. There is a race in the removal of the 
docker container where the pid may not be valid anymore (no such process), but 
the docker container is still in the running state. Because of that, I have 
added an exponential backoff of removal in this patch. It will try for 5 
iterations of increasing sleep times and eventually give up after the last one. 

> Container-executor doesn't remove Docker containers that error out early
> 
>
> Key: YARN-7189
> URL: https://issues.apache.org/jira/browse/YARN-7189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.9.0, 2.8.3, 3.0.1
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7189-b3.0.001.patch
>
>
> Once the docker run command is executed, the docker container is created 
> unless the return code is 125 meaning that the run command itself failed 
> (https://docs.docker.com/engine/reference/run/#exit-status). Any error that 
> happens after the docker run needs to remove the container during cleanup.
> {noformat:title=container-executor.c:launch_docker_container_as_user}
>   snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, 
> docker_command);
>   fprintf(LOGFILE, "Launching docker container...\n");
>   FILE* start_docker = popen(docker_command_with_binary, "r");
> {noformat}
> This is fixed by YARN-5366, which changes how we remove containers. However, 
> that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433162#comment-16433162
 ] 

Eric Yang commented on YARN-7221:
-

TestContainerSchedulerQueuing unit test failure is not related to changes in 
this patch.

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, 
> YARN-7221.021.patch, YARN-7221.022.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-04-10 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7973:

Fix Version/s: 3.2.0

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch, YARN-7973.004.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior

2018-04-10 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7984:

Fix Version/s: 3.2.0

> Delete registry entries from ZK on ServiceClient stop and clean up 
> stop/destroy behavior
> 
>
> Key: YARN-7984
> URL: https://issues.apache.org/jira/browse/YARN-7984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Fix For: 3.2.0
>
> Attachments: YARN-7984.1.patch, YARN-7984.2.patch
>
>
> The service records written to the registry are removed by ServiceClient on a 
> destroy call, but not on a stop call. The service AM does have some code to 
> clean up the registry entries when component instances are stopped, but if 
> the AM is killed before it has a chance to perform the cleanup, these entries 
> will be left in ZooKeeper. It would be better to clean these up in the stop 
> call, so that RegistryDNS does not provide lookups for containers that don't 
> exist.
> Additional stop/destroy behavior improvements include fixing errors / 
> unexpected behavior related to:
> * destroying a saved (not launched or started) service
> * destroying a stopped service
> * destroying a destroyed service
> * returning proper exit codes for destroy failures
> * performing other client operations on saved services (fixing NPEs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi reassigned YARN-8142:


Assignee: Billie Rinaldi

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7781) Update YARN-Services-Examples.md to be in sync with the latest code

2018-04-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433185#comment-16433185
 ] 

Jian He commented on YARN-7781:
---

sure, go ahead. thanks

> Update YARN-Services-Examples.md to be in sync with the latest code
> ---
>
> Key: YARN-7781
> URL: https://issues.apache.org/jira/browse/YARN-7781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Jian He
>Priority: Major
> Attachments: YARN-7781.01.patch, YARN-7781.02.patch, 
> YARN-7781.03.patch
>
>
> Update YARN-Services-Examples.md to make the following additions/changes:
> 1. Add an additional URL and PUT Request JSON to support flex:
> Update to flex up/down the no of containers (instances) of a component of a 
> service
> PUT URL – http://localhost:8088/app/v1/services/hello-world
> PUT Request JSON
> {code}
> {
>   "components" : [ {
> "name" : "hello",
> "number_of_containers" : 3
>   } ]
> }
> {code}
> 2. Modify all occurrences of /ws/ to /app/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-04-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433188#comment-16433188
 ] 

Hudson commented on YARN-7973:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13962 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13962/])
YARN-7973. Added ContainerRelaunch feature for Docker containers.
(eyang: rev c467f311d0c7155c09052d93fac12045af925583)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerCommandExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerRelaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitorResourceChange.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerStartCommand.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DefaultLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerStartCommand.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DelegatingLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.h
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerRelaunch.java


> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>   

[jira] [Commented] (YARN-8104) Add API to fetch node to attribute mapping

2018-04-10 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433193#comment-16433193
 ] 

Naganarasimha G R commented on YARN-8104:
-

Thanks for the patch [~bibinchundatt]

At high level
 * Can you inform why NodeToAttributesProto is moved from 
yarn_server_resourcemanager_service_protos.proto to yarn_protos.proto ?
 * Also would it make sense to provide overloaded method(getNodesToAttributes) 
here supporting for NodeID ?

and few other comments :

yarn_protos.protos
 * ln no 391: node => hostname ... similar to earlier naming convention

yarn_service_protos.protos
 * ln no 282: nodeToAttributes => nodesToAttributes ... based on convention 
followed in other places

GetNodesToAttributesRequest.java
 * line no 55 & 64 : setNodes & getNodes -> setHostNames & getHostNames

GetNodesToAttributesRequestPBImpl 
 * line no 120: initNodeAttributes => initNodesToAttributesRequest or just init
 * line no 126: nodeLabelsList => hostNamesList


TestPBImplRecords
 * We need to invoke generateByNewInstance for all the new PB's in setup. can 
you please check.

NodeAttributesManagerImpl
 * ln no 454-457: Here we do not have a mapping we are setting a hostname with 
empty set, is that better or just pass for the ones which have attributes is 
better?
IMO for the ones having mapping is better so that we not bloating the response 
and its a map. Also if nonexistent of erroneous hostnames are given it still 
shows empty set


TestClientRMService
 * ln no 2053: can we have a separate method to test node to attributes API ? 
or document what all api's will be tested. and rename it to 
testNodeAttributesQueryAPI... i would still prefer the former option itself 
though..

 

Can you also check the new findbug issue reported, checktyle  and as well as 
the javadoc issues reported, as it seems related to patch and fixable ?

> Add API to fetch node to attribute mapping
> --
>
> Key: YARN-8104
> URL: https://issues.apache.org/jira/browse/YARN-8104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8104-YARN-3409.001.patch, 
> YARN-8104-YARN-3409.002.patch, YARN-8104-YARN-3409.003.patch
>
>
> Add node/host to attribute mapping in yarn client API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior

2018-04-10 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433207#comment-16433207
 ] 

Billie Rinaldi commented on YARN-7984:
--

Thanks, [~eyang]! I plan to cherry-pick this to branch-3.1 as well.

> Delete registry entries from ZK on ServiceClient stop and clean up 
> stop/destroy behavior
> 
>
> Key: YARN-7984
> URL: https://issues.apache.org/jira/browse/YARN-7984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Fix For: 3.2.0
>
> Attachments: YARN-7984.1.patch, YARN-7984.2.patch
>
>
> The service records written to the registry are removed by ServiceClient on a 
> destroy call, but not on a stop call. The service AM does have some code to 
> clean up the registry entries when component instances are stopped, but if 
> the AM is killed before it has a chance to perform the cleanup, these entries 
> will be left in ZooKeeper. It would be better to clean these up in the stop 
> call, so that RegistryDNS does not provide lookups for containers that don't 
> exist.
> Additional stop/destroy behavior improvements include fixing errors / 
> unexpected behavior related to:
> * destroying a saved (not launched or started) service
> * destroying a stopped service
> * destroying a destroyed service
> * returning proper exit codes for destroy failures
> * performing other client operations on saved services (fixing NPEs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8126) [Follow up] Support auto-spawning of admin configured services during bootstrap of rm

2018-04-10 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433208#comment-16433208
 ] 

Gour Saha commented on YARN-8126:
-

I think this deserves to be a new sub-topic "System Services" on the left panel 
under "Service Discovery" (in the "YARN Service" section). It might seem that 
there is not much information for it to go to a new page, but there are 
primarily 2 reasons I am inclining towards it -
 # An ordinary end-user cannot create or add services to be started as 
system-services. So it should not be in the existing pages which focuses on 
what ordinary end-users can do. Hence in this new page, we should specifically 
call out that this is a cluster admin feature.
 # This is a pretty handy feature and going forward this page might grow as we 
add more system-service related features or add helpful system-services to the 
framework itself and would also need documentation to go with it.

What do you think?

> [Follow up] Support auto-spawning of admin configured services during 
> bootstrap of rm
> -
>
> Key: YARN-8126
> URL: https://issues.apache.org/jira/browse/YARN-8126
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8126.001.patch
>
>
> YARN-8048 adds support auto-spawning of admin configured services during 
> bootstrap of rm. 
> This JIRA is to follow up some of the comments discussed in YARN-8048. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433210#comment-16433210
 ] 

Eric Yang commented on YARN-8142:
-

In Unix terms, SIGTERM is used for terminating application.  My impression this 
is correct behavior rather than start another instance.  If other signal is 
used, then spawning another instance might be the right thing to do.

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433210#comment-16433210
 ] 

Eric Yang edited comment on YARN-8142 at 4/11/18 12:07 AM:
---

In Unix terms, SIGTERM is used for terminating application.  My impression this 
is correct behavior rather than start another instance.  If other signal is 
used (besides SIGKILL and SIGTERM), then spawning another instance might be the 
right thing to do.


was (Author: eyang):
In Unix terms, SIGTERM is used for terminating application.  My impression this 
is correct behavior rather than start another instance.  If other signal is 
used, then spawning another instance might be the right thing to do.

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-8140:
---

Assignee: Eric Yang

> Improve log message when launch cmd is ran for stopped yarn service
> ---
>
> Key: YARN-8140
> URL: https://issues.apache.org/jira/browse/YARN-8140
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>
> Steps:
>  1) Launch sleeper app
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms
> 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
> application_1523387473707_0007
> Exit Code: 0{code}
> 2) Stop the application
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
> sleeper2-duplicate-app-stopped
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms
> 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
> sleeper2-duplicate-app-stopped
> Exit Code: 0{code}
> 3) Launch the application with same name
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms
> 18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already 
> exists: 
> hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json
> Exit Code: 56
> {code}
>  
> Here, launch cmd fails with "Service Instance dir already exists: 
> hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".
>  
> The log message should be more meaningful. It should return that 
> "sleeper2-duplicate-app-stopped is in stopped state".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional c

[jira] [Commented] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-10 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433225#comment-16433225
 ] 

Gour Saha commented on YARN-8133:
-

Thanks [~rohithsharma]. 002 patch looks good. +1 for commit.

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch, YARN-8133.02.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-10 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433225#comment-16433225
 ] 

Gour Saha edited comment on YARN-8133 at 4/11/18 12:28 AM:
---

Thanks [~rohithsharma]. 02 patch looks good. +1 for commit.


was (Author: gsaha):
Thanks [~rohithsharma]. 002 patch looks good. +1 for commit.

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch, YARN-8133.02.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8133:
-
Fix Version/s: 3.2.0

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8133.01.patch, YARN-8133.02.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8143) Improve log message when Capacity Scheduler request allocation on node

2018-04-10 Thread Zian Chen (JIRA)
Zian Chen created YARN-8143:
---

 Summary: Improve log message when Capacity Scheduler request 
allocation on node
 Key: YARN-8143
 URL: https://issues.apache.org/jira/browse/YARN-8143
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Zian Chen
Assignee: Zian Chen


When scheduler request allocates container on the node with reserved containers 
on it, this log message will print very frequently which needs to be improved 
with more condition checks.
{code:java}
2018-02-02 11:41:13,105 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
2018-02-02 11:41:13,115 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
fulfill reservation for application application_1517571510094_0003 on node: 
ctr-e137-1514896590304-52728-01-07.hwx.site:25454
2018-02-02 11:41:13,115 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
Reserved container  application=application_1517571510094_0003 
resource= 
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
 cluster=
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433249#comment-16433249
 ] 

Hudson commented on YARN-8116:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13963 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13963/])
YARN-8116. Nodemanager fails with NumberFormatException: For input (wangda: rev 
2bf9cc2c73944c9f7cde56714b8cf6995cfa539b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java


> Nodemanager fails with NumberFormatException: For input string: ""
> --
>
> Key: YARN-8116
> URL: https://issues.apache.org/jira/browse/YARN-8116
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8116.001.patch, YARN-8116.002.patch
>
>
> Steps followed.
> 1) Update nodemanager debug delay config
> {code}
> 
>   yarn.nodemanager.delete.debug-delay-sec
>   350
> {code}
> 2) Launch distributed shell application multiple times
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
> hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
> -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> hadoop-yarn-applications-distributedshell-*.jar{code}
> 3) restart NM
> Nodemanager fails to start with below error.
> {code}
> {code:title=NM log}
> 2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: 
> true
> 2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
> as 3600. The logs will be aggregated every 3600 seconds
> 2018-03-23 21:32:14,455 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
> 2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(148)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2018-03-23 21:32:14,460 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
> INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.se

  1   2   >