[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications
[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431808#comment-16431808 ] Rohith Sharma K S commented on YARN-8130: - This appears due to delay in dispatcher thread. Below is the scenario where race condition can happen First -> CONTAINER_FINISHED event has come and put entity to publish into dispatcher. Second -> APP_FINISHED event has come and remove the timelineclient instance from map. Third -> Event dispatcher is processing event to publish entity and finds that corresponding application id doesn't finds it. I think before removing timelinecilent directly we should schedule for configured delay unlike PerNodeTimelineCollectorsAuxService#removeApplicationCollector. cc :/ [~haibochen] [~vrushalic] > Race condition when container events are published for KILLED applications > -- > > Key: YARN-8130 > URL: https://issues.apache.org/jira/browse/YARN-8130 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Reporter: Charan Hebri >Priority: Major > > There seems to be a race condition happening when an application is KILLED > and the corresponding container event information is being published. For > completed containers, a YARN_CONTAINER_FINISHED event is generated but for > some containers in a KILLED application this information is missing. Below is > a node manager log snippet, > {code:java} > 2018-04-09 08:44:54,474 INFO shuffle.ExternalShuffleBlockResolver > (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application > application_1523259757659_0003 removed, cleanupLocalDirs = false > 2018-04-09 08:44:54,478 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1523259757659_0003 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher > (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been > removed before the entity could be published for > TimelineEntity[type='YARN_CONTAINER', > id='container_1523259757659_0003_01_02'] > 2018-04-09 08:44:54,478 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just > finished : application_1523259757659_0003 > 2018-04-09 08:44:54,488 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_01. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:54,492 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_02. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:55,470 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(192)) - The collector service for > application_1523259757659_0003 was removed > 2018-04-09 08:44:55,472 INFO containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1572)) - couldn't find application > application_1523259757659_0003 while processing FINISH_APPS event. The > ResourceManager allocated resources for this application to the NodeManager > but no active containers were found to process{code} > The container id specified in the log, > *container_1523259757659_0003_01_02* is the one that has the finished > event missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7941) Transitive dependencies for component are not resolved
[ https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431811#comment-16431811 ] Rohith Sharma K S commented on YARN-7941: - +lgtm, verified the patch in cluster. {code} 2018-04-10 12:01:12,982 [pool-7-thread-1] INFO monitor.ServiceMonitor - [COMPONENT regionserver]: Dependencies satisfied, ramping up. . 2018-04-10 12:01:42,880 [pool-7-thread-1] INFO monitor.ServiceMonitor - [COMPONENT hbaseclient]: Dependencies satisfied, ramping up. {code} > Transitive dependencies for component are not resolved > --- > > Key: YARN-7941 > URL: https://issues.apache.org/jira/browse/YARN-7941 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-7941.1.patch > > > It is observed that transitive dependencies are not resolved as a result one > of the component is started earlier. > Ex : In HBase app, > master is independent component, > regionserver is depends on master. > hbaseclient depends on regionserver, > but I always see that HBaseClient is launched before regionserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8104) Add API to fetch node to attribute mapping
[ https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8104: --- Attachment: YARN-8104-YARN-3409.002.patch > Add API to fetch node to attribute mapping > -- > > Key: YARN-8104 > URL: https://issues.apache.org/jira/browse/YARN-8104 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8104-YARN-3409.001.patch, > YARN-8104-YARN-3409.002.patch > > > Add node/host to attribute mapping in yarn client API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8104) Add API to fetch node to attribute mapping
[ https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8104: --- Attachment: YARN-8104-YARN-3409.003.patch > Add API to fetch node to attribute mapping > -- > > Key: YARN-8104 > URL: https://issues.apache.org/jira/browse/YARN-8104 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8104-YARN-3409.001.patch, > YARN-8104-YARN-3409.002.patch, YARN-8104-YARN-3409.003.patch > > > Add node/host to attribute mapping in yarn client API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8104) Add API to fetch node to attribute mapping
[ https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431863#comment-16431863 ] Bibin A Chundatt commented on YARN-8104: Thank you [~Naganarasimha] for review Attached rebased patch and also handling your comments. > Add API to fetch node to attribute mapping > -- > > Key: YARN-8104 > URL: https://issues.apache.org/jira/browse/YARN-8104 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8104-YARN-3409.001.patch, > YARN-8104-YARN-3409.002.patch, YARN-8104-YARN-3409.003.patch > > > Add node/host to attribute mapping in yarn client API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7088) Fix application start time and add submit time to UIs
[ https://issues.apache.org/jira/browse/YARN-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431932#comment-16431932 ] genericqa commented on YARN-7088: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 39m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 6m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 22m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 36m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 36m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 36m 42s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 53s{color} | {color:orange} root: The patch generated 1 new + 1115 unchanged - 2 fixed = 1116 total (was 1117) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 9m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 18s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 6s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 58s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 25s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 4s{color} | {color:green} The patch does not generate ASF Licens
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: hadoop-2.7.2.port-gpu.patch > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2-gpu-port.patch, hadoop-2.7.2-gpu.patch, > hadoop-2.7.2.port-gpu.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: (was: hadoop-2.7.2.port-gpu.patch) > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2-gpu-port.patch, hadoop-2.7.2-gpu.patch, > hadoop-2.7.2.port-gpu.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7930) Add configuration to initialize RM with configured labels.
[ https://issues.apache.org/jira/browse/YARN-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-7930: Attachment: YARN-7930.005.patch > Add configuration to initialize RM with configured labels. > -- > > Key: YARN-7930 > URL: https://issues.apache.org/jira/browse/YARN-7930 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-7930.001.patch, YARN-7930.002.patch, > YARN-7930.003.patch, YARN-7930.004.patch, YARN-7930.005.patch > > > At present, the only way to create labels is using admin API. Sometimes, > there is a requirement to start the cluster with pre-configured node labels. > This Jira introduces yarn configurations to start RM with predefined node > labels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8134) Support specifying node resources in SLS
[ https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-8134: Attachment: YARN-8134.002.patch > Support specifying node resources in SLS > > > Key: YARN-8134 > URL: https://issues.apache.org/jira/browse/YARN-8134 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8134.002.patch, YARN-8134.patch > > > At present, all nodes have same resources in SLS. We need to add capability > to add different resources to different nodes in SLS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before
[ https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432034#comment-16432034 ] can_he commented on YARN-6629: -- [~Tao Yang] Thank you so much for your rapidly reply! One more question, does the new patch need to pass the test before apply to the branch. > NPE occurred when container allocation proposal is applied but its resource > requests are removed before > --- > > Key: YARN-6629 > URL: https://issues.apache.org/jira/browse/YARN-6629 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.1.0 > > Attachments: YARN-6629.001.patch, YARN-6629.002.patch, > YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, > YARN-6629.006.patch, YARN-6629.branch-2.001.patch > > > I wrote a test case to reproduce another problem for branch-2 and found new > NPE error, log: > {code} > FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in > handling event type NODE_UPDATE to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516) > at > org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225) > at > org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31) > at org.mockito.internal.MockHandler.handle(MockHandler.java:97) > at > org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply() > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:745) > {code} > Reproduce this error in chronological order: > 1. AM started and requested 1 container with schedulerRequestKey#1 : > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests > Added schedulerRequestKey#1 into schedulerKeyToPlacementSets > 2. Scheduler allocatd 1 container for this request and accepted the proposal > 3. AM removed this request > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests --> > AppSchedulingInfo#addToPlacementSets --> > AppSchedulingInfo#updatePendingResources > Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets) > 4. Scheduler applied this proposal > CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> > AppSchedulingInfo#allocate > Throw NPE when called > schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, > type, node); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues
[jira] [Commented] (YARN-7930) Add configuration to initialize RM with configured labels.
[ https://issues.apache.org/jira/browse/YARN-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432045#comment-16432045 ] genericqa commented on YARN-7930: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 13s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 88m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-7930 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918343/YARN-7930.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 9e8338317668 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e87be8a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test
[jira] [Created] (YARN-8139) Skip node hostname resolution when running SLS.
Abhishek Modi created YARN-8139: --- Summary: Skip node hostname resolution when running SLS. Key: YARN-8139 URL: https://issues.apache.org/jira/browse/YARN-8139 Project: Hadoop YARN Issue Type: Bug Reporter: Abhishek Modi Assignee: Abhishek Modi Currently depending on the time taken in resolution of hostname, metrics of SLS gets skewed. To avoid this, in this fix we are introducing a flag which can be used to disable hostname resolutions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7804) Refresh action on Grid view page should not be redirected to graph view
[ https://issues.apache.org/jira/browse/YARN-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432057#comment-16432057 ] Hudson commented on YARN-7804: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13952 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13952/]) YARN-7804. [UI2] Refresh action on Grid view page should not be (sunilg: rev 7c1e77dda4cb3ba8952328d142aafcf0366b5903) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/components/timeline-view.js * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-app-attempt.js * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/timeline-view.hbs * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-app/attempts.js * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-app-attempt.hbs * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-app/attempts.hbs > Refresh action on Grid view page should not be redirected to graph view > --- > > Key: YARN-7804 > URL: https://issues.apache.org/jira/browse/YARN-7804 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.0.0 >Reporter: Yesha Vora >Assignee: Gergely Novák >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-7804.001.patch > > > Steps: > 1) Go to application attempt page > http://host:8088/ui2/#/yarn-app/application_1516734339938_0020/attempts?service=abc > 2) click on grid view > 3) click refresh button of the page > Actual behavior: > on refresh page, the page moves come back to graph view. > Expected behavior: > on refreshing page, the page should stay at grid view -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7825) Maintain constant horizontal application info bar for all pages
[ https://issues.apache.org/jira/browse/YARN-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432068#comment-16432068 ] Gergely Novák commented on YARN-7825: - [~yeshavora] Could you please attach a screenshot? > Maintain constant horizontal application info bar for all pages > --- > > Key: YARN-7825 > URL: https://issues.apache.org/jira/browse/YARN-7825 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Yesha Vora >Priority: Major > > Steps: > 1) enable Ats v2 > 2) Start Yarn service application ( Httpd ) > 3) Fix horizontal info bar for below pages. > * component page > * Component Instance info page > * Application attempt Info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7830) If attempt has selected grid view, attempt info page should be opened with grid view
[ https://issues.apache.org/jira/browse/YARN-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Novák updated YARN-7830: Attachment: YARN-7830.001.patch > If attempt has selected grid view, attempt info page should be opened with > grid view > - > > Key: YARN-7830 > URL: https://issues.apache.org/jira/browse/YARN-7830 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.0.0 >Reporter: Yesha Vora >Assignee: Gergely Novák >Priority: Major > Attachments: YARN-7830.001.patch > > > Steps: > 1) Start Application and visit attempt page > 2) click on Grid view > 3) Click on attempt 1 > > Current behavior: > This page is redirected to attempt info page. This page redirects to graph > view . > > Expected behavior: > In this scenario, It should redirect to grid view. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8134) Support specifying node resources in SLS
[ https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432086#comment-16432086 ] genericqa commented on YARN-8134: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 55s{color} | {color:red} hadoop-sls in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 18s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 56m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.sls.TestSLSStreamAMSynth | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-8134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918353/YARN-8134.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e05796de391a 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7623cc5 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/20287/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20287/testReport/ | | asflicense | https://builds.apache.org/job/PreCommit-YARN-Build/20287/artifact/out/patch-asflicense-problems.txt | | Max. process+thread count | 467 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20287/console | | P
[jira] [Commented] (YARN-7830) If attempt has selected grid view, attempt info page should be opened with grid view
[ https://issues.apache.org/jira/browse/YARN-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432143#comment-16432143 ] genericqa commented on YARN-7830: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 36m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 49m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-7830 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918362/YARN-7830.001.patch | | Optional Tests | asflicense shadedclient | | uname | Linux c9fd8e30ec1b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7c1e77d | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 303 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20288/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > If attempt has selected grid view, attempt info page should be opened with > grid view > - > > Key: YARN-7830 > URL: https://issues.apache.org/jira/browse/YARN-7830 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.0.0 >Reporter: Yesha Vora >Assignee: Gergely Novák >Priority: Major > Attachments: YARN-7830.001.patch > > > Steps: > 1) Start Application and visit attempt page > 2) click on Grid view > 3) Click on attempt 1 > > Current behavior: > This page is redirected to attempt info page. This page redirects to graph > view . > > Expected behavior: > In this scenario, It should redirect to grid view. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8127) Resource leak when async scheduling is enabled
[ https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432214#comment-16432214 ] Tao Yang commented on YARN-8127: Thanks [~cheersyang]. Some additional details: This problem is most often happened in async-scheduling mode, multiple async scheduling threads may allocate for the same node which has reserved container and generate duplicated allocate-from-reserved proposals for the same reserved container. These duplicated proposals can be successfully applied in commit phase so that resources of node/queue will be decreased several times for a same container. We can add a check of reserved state for allocate-from-reserved proposals in commit phase to solve this problem. Attached a patch with UT for review. > Resource leak when async scheduling is enabled > -- > > Key: YARN-8127 > URL: https://issues.apache.org/jira/browse/YARN-8127 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Tao Yang >Priority: Critical > > Brief steps to reproduce > # Enable async scheduling, 5 threads > # Submit a lot of jobs trying to exhaust cluster resource > # After a while, observed NM allocated resource is more than resource > requested by allocated containers > Looks like the commit phase is not sync handling reserved containers, causing > some proposal incorrectly accepted, subsequently resource was deducted > multiple times for a container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8104) Add API to fetch node to attribute mapping
[ https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432328#comment-16432328 ] genericqa commented on YARN-8104: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} YARN-3409 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 25s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 23s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 3s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 15s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 0s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 20s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in YARN-3409 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 50s{color} | {color:green} YARN-3409 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 29m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 25s{color} | {color:orange} root: The patch generated 6 new + 261 unchanged - 0 fixed = 267 total (was 261) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 34s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 32s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 56s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common generated 1 new + 4189 unchanged - 0 fixed = 4190 total (was 4189) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 14s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 14s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 27m 3s{color} | {color:green} hado
[jira] [Updated] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior
[ https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-7984: - Description: The service records written to the registry are removed by ServiceClient on a destroy call, but not on a stop call. The service AM does have some code to clean up the registry entries when component instances are stopped, but if the AM is killed before it has a chance to perform the cleanup, these entries will be left in ZooKeeper. It would be better to clean these up in the stop call, so that RegistryDNS does not provide lookups for containers that don't exist. Additional stop/destroy behavior improvements include: * destroying a saved (not launched or started) service * destroying a stopped service * destroying a destroyed service * was:The service records written to the registry are removed by ServiceClient on a destroy call, but not on a stop call. The service AM does have some code to clean up the registry entries when component instances are stopped, but if the AM is killed before it has a chance to perform the cleanup, these entries will be left in ZooKeeper. It would be better to clean these up in the stop call, so that RegistryDNS does not provide lookups for containers that don't exist. > Delete registry entries from ZK on ServiceClient stop and clean up > stop/destroy behavior > > > Key: YARN-7984 > URL: https://issues.apache.org/jira/browse/YARN-7984 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-7984.1.patch, YARN-7984.2.patch > > > The service records written to the registry are removed by ServiceClient on a > destroy call, but not on a stop call. The service AM does have some code to > clean up the registry entries when component instances are stopped, but if > the AM is killed before it has a chance to perform the cleanup, these entries > will be left in ZooKeeper. It would be better to clean these up in the stop > call, so that RegistryDNS does not provide lookups for containers that don't > exist. > Additional stop/destroy behavior improvements include: > * destroying a saved (not launched or started) service > * destroying a stopped service > * destroying a destroyed service > * -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior
[ https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-7984: - Description: The service records written to the registry are removed by ServiceClient on a destroy call, but not on a stop call. The service AM does have some code to clean up the registry entries when component instances are stopped, but if the AM is killed before it has a chance to perform the cleanup, these entries will be left in ZooKeeper. It would be better to clean these up in the stop call, so that RegistryDNS does not provide lookups for containers that don't exist. Additional stop/destroy behavior improvements include fixing errors / unexpected behavior related to: * destroying a saved (not launched or started) service * destroying a stopped service * destroying a destroyed service * returning proper exit codes for destroy failures * performing other client operations on saved services (fixing NPEs) was: The service records written to the registry are removed by ServiceClient on a destroy call, but not on a stop call. The service AM does have some code to clean up the registry entries when component instances are stopped, but if the AM is killed before it has a chance to perform the cleanup, these entries will be left in ZooKeeper. It would be better to clean these up in the stop call, so that RegistryDNS does not provide lookups for containers that don't exist. Additional stop/destroy behavior improvements include: * destroying a saved (not launched or started) service * destroying a stopped service * destroying a destroyed service * > Delete registry entries from ZK on ServiceClient stop and clean up > stop/destroy behavior > > > Key: YARN-7984 > URL: https://issues.apache.org/jira/browse/YARN-7984 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-7984.1.patch, YARN-7984.2.patch > > > The service records written to the registry are removed by ServiceClient on a > destroy call, but not on a stop call. The service AM does have some code to > clean up the registry entries when component instances are stopped, but if > the AM is killed before it has a chance to perform the cleanup, these entries > will be left in ZooKeeper. It would be better to clean these up in the stop > call, so that RegistryDNS does not provide lookups for containers that don't > exist. > Additional stop/destroy behavior improvements include fixing errors / > unexpected behavior related to: > * destroying a saved (not launched or started) service > * destroying a stopped service > * destroying a destroyed service > * returning proper exit codes for destroy failures > * performing other client operations on saved services (fixing NPEs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8125) YARNUIV2 does not failover from standby to active endpoint like previous YARN UI
[ https://issues.apache.org/jira/browse/YARN-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phil Zampino updated YARN-8125: --- Description: If the YARN UI is accessed via the standby resource manager endpoint, it automatically redirects the requests to the active resource manager endpoint. YARNUIV2 should behave the same way. Apache Knox 1.0.0 introduced the ability to dynamically determine proxied RESOURCEMANAGER and YARNUI service endpoints based on YARN configuration from Ambari. This functionality works for RM and YARNUI because even though the YARN config may reference the standby RM endpoint, requests are automatically redirected to the active endpoint. RM and YARNUI requests to the standby endpoint result in a HTTP 307 response. YARNUIV2 should respond similarly. If YARNUIV2 behaves differently, then Knox will not be able to support its own dynamic configuration behavior when proxying YARNUIV2. KNOX-1212 adds the integration with Knox, but KNOX-1236 is blocked by this issue. was: If the YARN UI is accessed via the standby resource manager endpoint, it automatically redirects the requests to the active resource manager endpoint. YARNUIV2 should behave the same way. Apache Knox 1.0.0 introduced the ability to dynamically determine proxied RESOURCEMANAGER and YARNUI service endpoints based on YARN configuration from Ambari. This functionality works for RM and YARNUI because even though the YARN config may reference the standby RM endpoint, requests are automatically redirected to the active endpoint. If YARNUIV2 behaves differently, then Knox will not be able to support its own dynamic configuration behavior when proxying YARNUIV2. KNOX-1212 adds the integration with Knox, but KNOX-1236 is blocked by this issue. > YARNUIV2 does not failover from standby to active endpoint like previous YARN > UI > > > Key: YARN-8125 > URL: https://issues.apache.org/jira/browse/YARN-8125 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Phil Zampino >Priority: Major > > If the YARN UI is accessed via the standby resource manager endpoint, it > automatically redirects the requests to the active resource manager endpoint. > YARNUIV2 should behave the same way. > Apache Knox 1.0.0 introduced the ability to dynamically determine proxied > RESOURCEMANAGER and YARNUI service endpoints based on YARN configuration from > Ambari. This functionality works for RM and YARNUI because even though the > YARN config may reference the standby RM endpoint, requests are automatically > redirected to the active endpoint. > RM and YARNUI requests to the standby endpoint result in a HTTP 307 response. > YARNUIV2 should respond similarly. > If YARNUIV2 behaves differently, then Knox will not be able to support its > own dynamic configuration behavior when proxying YARNUIV2. > KNOX-1212 adds the integration with Knox, but KNOX-1236 is blocked by this > issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8134) Support specifying node resources in SLS
[ https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-8134: Attachment: YARN-8134.003.patch > Support specifying node resources in SLS > > > Key: YARN-8134 > URL: https://issues.apache.org/jira/browse/YARN-8134 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8134.002.patch, YARN-8134.003.patch, YARN-8134.patch > > > At present, all nodes have same resources in SLS. We need to add capability > to add different resources to different nodes in SLS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8125) YARNUIV2 does not failover from standby to active endpoint like previous YARN UI
[ https://issues.apache.org/jira/browse/YARN-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-8125: - Assignee: Sunil G > YARNUIV2 does not failover from standby to active endpoint like previous YARN > UI > > > Key: YARN-8125 > URL: https://issues.apache.org/jira/browse/YARN-8125 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Phil Zampino >Assignee: Sunil G >Priority: Major > > If the YARN UI is accessed via the standby resource manager endpoint, it > automatically redirects the requests to the active resource manager endpoint. > YARNUIV2 should behave the same way. > Apache Knox 1.0.0 introduced the ability to dynamically determine proxied > RESOURCEMANAGER and YARNUI service endpoints based on YARN configuration from > Ambari. This functionality works for RM and YARNUI because even though the > YARN config may reference the standby RM endpoint, requests are automatically > redirected to the active endpoint. > RM and YARNUI requests to the standby endpoint result in a HTTP 307 response. > YARNUIV2 should respond similarly. > If YARNUIV2 behaves differently, then Knox will not be able to support its > own dynamic configuration behavior when proxying YARNUIV2. > KNOX-1212 adds the integration with Knox, but KNOX-1236 is blocked by this > issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7221: Attachment: YARN-7221.021.patch > Add security check for privileged docker container > -- > > Key: YARN-7221 > URL: https://issues.apache.org/jira/browse/YARN-7221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security >Affects Versions: 3.0.0, 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-7221.001.patch, YARN-7221.002.patch, > YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, > YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, > YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, > YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, > YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, > YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, > YARN-7221.021.patch > > > When a docker is running with privileges, majority of the use case is to have > some program running with root then drop privileges to another user. i.e. > httpd to start with privileged and bind to port 80, then drop privileges to > www user. > # We should add security check for submitting users, to verify they have > "sudo" access to run privileged container. > # We should remove --user=uid:gid for privileged containers. > > Docker can be launched with --privileged=true, and --user=uid:gid flag. With > this parameter combinations, user will not have access to become root user. > All docker exec command will be drop to uid:gid user to run instead of > granting privileges. User can gain root privileges if container file system > contains files that give user extra power, but this type of image is > considered as dangerous. Non-privileged user can launch container with > special bits to acquire same level of root power. Hence, we lose control of > which image should be run with --privileges, and who have sudo rights to use > privileged container images. As the result, we should check for sudo access > then decide to parameterize --privileged=true OR --user=uid:gid. This will > avoid leading developer down the wrong path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432393#comment-16432393 ] Eric Yang commented on YARN-7221: - Patch 21 Remove checkstyle error, and also removed group-add for privileged container. > Add security check for privileged docker container > -- > > Key: YARN-7221 > URL: https://issues.apache.org/jira/browse/YARN-7221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security >Affects Versions: 3.0.0, 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-7221.001.patch, YARN-7221.002.patch, > YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, > YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, > YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, > YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, > YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, > YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, > YARN-7221.021.patch > > > When a docker is running with privileges, majority of the use case is to have > some program running with root then drop privileges to another user. i.e. > httpd to start with privileged and bind to port 80, then drop privileges to > www user. > # We should add security check for submitting users, to verify they have > "sudo" access to run privileged container. > # We should remove --user=uid:gid for privileged containers. > > Docker can be launched with --privileged=true, and --user=uid:gid flag. With > this parameter combinations, user will not have access to become root user. > All docker exec command will be drop to uid:gid user to run instead of > granting privileges. User can gain root privileges if container file system > contains files that give user extra power, but this type of image is > considered as dangerous. Non-privileged user can launch container with > special bits to acquire same level of root power. Hence, we lose control of > which image should be run with --privileges, and who have sudo rights to use > privileged container images. As the result, we should check for sudo access > then decide to parameterize --privileged=true OR --user=uid:gid. This will > avoid leading developer down the wrong path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8127) Resource leak when async scheduling is enabled
[ https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8127: --- Attachment: YARN-8127.001.patch > Resource leak when async scheduling is enabled > -- > > Key: YARN-8127 > URL: https://issues.apache.org/jira/browse/YARN-8127 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8127.001.patch > > > Brief steps to reproduce > # Enable async scheduling, 5 threads > # Submit a lot of jobs trying to exhaust cluster resource > # After a while, observed NM allocated resource is more than resource > requested by allocated containers > Looks like the commit phase is not sync handling reserved containers, causing > some proposal incorrectly accepted, subsequently resource was deducted > multiple times for a container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8057) Inadequate information for handling catch clauses
[ https://issues.apache.org/jira/browse/YARN-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432420#comment-16432420 ] ASF GitHub Bot commented on YARN-8057: -- GitHub user lzh3636 opened a pull request: https://github.com/apache/hadoop/pull/362 YARN-8057 Inadequate information for handling catch clauses The description of the problem: https://issues.apache.org/jira/browse/YARN-8057 I added stack traces information to those two logging statements, so that the full exception information can be generated to the logs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lzh3636/hadoop YARN-8057 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/362.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #362 commit 002935625784226ba915e8631a0c8298d423677f Author: lzh3636 Date: 2018-04-10T15:03:54Z update stack traces to show exception info > Inadequate information for handling catch clauses > - > > Key: YARN-8057 > URL: https://issues.apache.org/jira/browse/YARN-8057 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, yarn >Affects Versions: 3.0.0 >Reporter: Zhenhao Li >Priority: Major > Labels: easyfix > > Their are some situations that different exception types are caught, but the > handling of those exceptions can not show the differences of those types. > Here are the code snippets we found which have this problem: > *org/apache/hadoop/yarn/client/api/impl/NMClientImpl.java* > [https://github.com/apache/hadoop/blob/c02d2ba50db8a355ea03081c3984b2ea0c375a3f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/NMClientImpl.java] > At Line *125* and Line *129.* We can see that two exception types are caught, > but the logging statements here can not show the exception type at all. It > may cause confusions to the person who is reading the log, the person can not > know what exception happened here. > > Maybe adding stack trace information to these two logging statements is a > simple way to improve it. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432433#comment-16432433 ] genericqa commented on YARN-7221: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} YARN-7221 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-7221 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918397/YARN-7221.021.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20289/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add security check for privileged docker container > -- > > Key: YARN-7221 > URL: https://issues.apache.org/jira/browse/YARN-7221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security >Affects Versions: 3.0.0, 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-7221.001.patch, YARN-7221.002.patch, > YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, > YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, > YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, > YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, > YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, > YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, > YARN-7221.021.patch > > > When a docker is running with privileges, majority of the use case is to have > some program running with root then drop privileges to another user. i.e. > httpd to start with privileged and bind to port 80, then drop privileges to > www user. > # We should add security check for submitting users, to verify they have > "sudo" access to run privileged container. > # We should remove --user=uid:gid for privileged containers. > > Docker can be launched with --privileged=true, and --user=uid:gid flag. With > this parameter combinations, user will not have access to become root user. > All docker exec command will be drop to uid:gid user to run instead of > granting privileges. User can gain root privileges if container file system > contains files that give user extra power, but this type of image is > considered as dangerous. Non-privileged user can launch container with > special bits to acquire same level of root power. Hence, we lose control of > which image should be run with --privileges, and who have sudo rights to use > privileged container images. As the result, we should check for sudo access > then decide to parameterize --privileged=true OR --user=uid:gid. This will > avoid leading developer down the wrong path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432474#comment-16432474 ] Manikandan R commented on YARN-4606: [~leftnoteasy] [~sunilg] Can you please share your views? > CapacityScheduler: applications could get starved because computation of > #activeUsers considers pending apps > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch > > > Currently, if all applications belong to same user in LeafQueue are pending > (caused by max-am-percent, etc.), ActiveUsersManager still considers the user > is an active user. This could lead to starvation of active applications, for > example: > - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to > user3)/app4(belongs to user4) are pending > - ActiveUsersManager returns #active-users=4 > - However, there're only two users (user1/user2) are able to allocate new > resources. So computed user-limit-resource could be lower than expected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7159) Normalize unit of resource objects in RM and avoid to do unit conversion in critical path
[ https://issues.apache.org/jira/browse/YARN-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432477#comment-16432477 ] Manikandan R commented on YARN-7159: [~sunilg] Can you please take this forward? Thanks. > Normalize unit of resource objects in RM and avoid to do unit conversion in > critical path > - > > Key: YARN-7159 > URL: https://issues.apache.org/jira/browse/YARN-7159 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Wangda Tan >Assignee: Manikandan R >Priority: Critical > Attachments: YARN-7159.001.patch, YARN-7159.002.patch, > YARN-7159.003.patch, YARN-7159.004.patch, YARN-7159.005.patch, > YARN-7159.006.patch, YARN-7159.007.patch, YARN-7159.008.patch, > YARN-7159.009.patch, YARN-7159.010.patch, YARN-7159.011.patch, > YARN-7159.012.patch, YARN-7159.013.patch, YARN-7159.015.patch, > YARN-7159.016.patch, YARN-7159.017.patch, YARN-7159.018.patch, > YARN-7159.019.patch, YARN-7159.020.patch, YARN-7159.021.patch, > YARN-7159.022.patch, YARN-7159.023.patch > > > Currently resource conversion could happen in critical code path when > different unit is specified by client. This could impact performance and > throughput of RM a lot. We should do unit normalization when resource passed > to RM and avoid expensive unit conversion every time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432492#comment-16432492 ] Wangda Tan commented on YARN-8135: -- [~oliverhuh...@gmail.com], There's no technical issues to make TF application to access HDFS. But it is really a overhead to use HDFS if the user doesn't have experience of Hadoop before [https://www.tensorflow.org/deploy/hadoop]. Just want to make this step easier. [~asuresh], Thanks for interested in this project. I'm not sure Hadoop-Submarine or YARN-Submarine, let's decide once I finish the design. > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432502#comment-16432502 ] Wei Yan commented on YARN-8135: --- {quote}Think this should be renamed to YARN-Submarine though. I'm not sure Hadoop-Submarine or YARN-Submarine, let's decide once I finish the design. {quote} Hadoop-Submarine may be better here, as the project may not just only involve with YARN. Also, Hadoop-Submarine may be more attractive than YARN-Submarine. > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8134) Support specifying node resources in SLS
[ https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432542#comment-16432542 ] genericqa commented on YARN-8134: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 39s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 32s{color} | {color:green} hadoop-sls in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-8134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918396/YARN-8134.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle | | uname | Linux 16f633d2cbd4 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cef8eb7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20291/testReport/ | | Max. process+thread count | 456 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20291/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Support specifying node resource
[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
[ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432589#comment-16432589 ] Shane Kumpf commented on YARN-2674: --- [~chenchun] - Thanks for the patch here. We are seeing this when testing the Docker runtime and it results in extra Docker containers being launched on RM restart, which is problematic. I've validated that the logic in this patch resolves that issue. Any chance you'd be able to update the patch? If you don't have the time, I could put up a patch based on your previous patch. > Distributed shell AM may re-launch containers if RM work preserving restart > happens > --- > > Key: YARN-2674 > URL: https://issues.apache.org/jira/browse/YARN-2674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, resourcemanager >Reporter: Chun Chen >Assignee: Chun Chen >Priority: Major > Labels: oct16-easy > Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, > YARN-2674.4.patch, YARN-2674.5.patch > > > Currently, if RM work preserving restart happens while distributed shell is > running, distribute shell AM may re-launch all the containers, including > new/running/complete. We must make sure it won't re-launch the > running/complete containers. > We need to remove allocated containers from > AMRMClientImpl#remoteRequestsTable once AM receive them from RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7598) Document how to use classpath isolation for aux-services in YARN
[ https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-7598: Attachment: YARN-7598.4.patch > Document how to use classpath isolation for aux-services in YARN > > > Key: YARN-7598 > URL: https://issues.apache.org/jira/browse/YARN-7598 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Major > Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, > YARN-7598.trunk.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7598) Document how to use classpath isolation for aux-services in YARN
[ https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432601#comment-16432601 ] Xuan Gong commented on YARN-7598: - Thanks for the review. [~djp] Uploaded a new patch to address all your comments. > Document how to use classpath isolation for aux-services in YARN > > > Key: YARN-7598 > URL: https://issues.apache.org/jira/browse/YARN-7598 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Major > Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, > YARN-7598.trunk.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8127) Resource leak when async scheduling is enabled
[ https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432629#comment-16432629 ] genericqa commented on YARN-8127: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 23 unchanged - 0 fixed = 24 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 26s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}138m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-8127 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918404/YARN-8127.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 281e86cf911b 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cef8eb7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20290/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.
[jira] [Commented] (YARN-7931) [atsv2 read acls] Include domain table creation as part of schema creator
[ https://issues.apache.org/jira/browse/YARN-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432688#comment-16432688 ] Vrushali C commented on YARN-7931: -- Hi [~haibochen] That's a good question. Let me check what it does, will see if I can add some unit test to determine the behavior expectation. I will update the jira / patch shortly. thanks Vrushali > [atsv2 read acls] Include domain table creation as part of schema creator > - > > Key: YARN-7931 > URL: https://issues.apache.org/jira/browse/YARN-7931 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Vrushali C >Priority: Major > Attachments: YARN-7391.0001.patch, YARN-7391.0002.patch, > YARN-7391.0003.patch > > > > Update the schema creator to create a domain table to store timeline entity > domain info. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8073) TimelineClientImpl doesn't honor yarn.timeline-service.versions configuration
[ https://issues.apache.org/jira/browse/YARN-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432710#comment-16432710 ] Vrushali C commented on YARN-8073: -- Hi [~rohithsharma] Yes, please go ahead with cherry pick, the branch-2 jenkins issue is unrelated and as long as our local machine compilation & unit tests work, I think we can commit it. thanks Vrushali > TimelineClientImpl doesn't honor yarn.timeline-service.versions configuration > - > > Key: YARN-8073 > URL: https://issues.apache.org/jira/browse/YARN-8073 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-8073-branch-2.03.patch, YARN-8073.01.patch, > YARN-8073.02.patch, YARN-8073.03.patch > > > Post YARN-6736, RM support writing into ats v1 and v2 by new configuration > setting _yarn.timeline-service.versions_. > Couple of issues observed in deployment are > # TimelineClientImpl doesn't honor newly added configuration rather it still > get version number from _yarn.timeline-service.version_. This causes not > writing into v1.5 API's even though _yarn.timeline-service.versions has 1.5 > value._ > # Similar line from 1st point, TimelineUtils#timelineServiceV1_5Enabled > doesn't honor timeline-service.versions. > # JobHistoryEventHandler#serviceInit(), line no 271 check for version number > rather than calling YarnConfiguration#timelineServiceV2Enabled > cc :/ [~agresch] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7825) Maintain constant horizontal application info bar for all pages
[ https://issues.apache.org/jira/browse/YARN-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-7825: - Attachment: Screen Shot 2018-04-10 at 11.06.40 AM.png Screen Shot 2018-04-10 at 11.07.29 AM.png Screen Shot 2018-04-10 at 11.07.07 AM.png Screen Shot 2018-04-10 at 11.06.27 AM.png Screen Shot 2018-04-10 at 11.15.27 AM.png > Maintain constant horizontal application info bar for all pages > --- > > Key: YARN-7825 > URL: https://issues.apache.org/jira/browse/YARN-7825 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-04-10 at 11.06.27 AM.png, Screen Shot > 2018-04-10 at 11.06.40 AM.png, Screen Shot 2018-04-10 at 11.07.07 AM.png, > Screen Shot 2018-04-10 at 11.07.29 AM.png, Screen Shot 2018-04-10 at 11.15.27 > AM.png > > > Steps: > 1) enable Ats v2 > 2) Start Yarn service application ( Httpd ) > 3) Fix horizontal info bar for below pages. > * component page > * Component Instance info page > * Application attempt Info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7598) Document how to use classpath isolation for aux-services in YARN
[ https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432732#comment-16432732 ] genericqa commented on YARN-7598: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 36m 0s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 19 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 49m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-7598 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918413/YARN-7598.4.patch | | Optional Tests | asflicense mvnsite | | uname | Linux 80cfdff66c5a 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cef8eb7 | | maven | version: Apache Maven 3.3.9 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/20292/artifact/out/whitespace-tabs.txt | | Max. process+thread count | 341 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20292/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Document how to use classpath isolation for aux-services in YARN > > > Key: YARN-7598 > URL: https://issues.apache.org/jira/browse/YARN-7598 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Major > Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, > YARN-7598.trunk.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications
[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432723#comment-16432723 ] Vrushali C commented on YARN-8130: -- Yes, I agree, we need a configurable delay like the collectorLingerPeriod in the PerNodeTimelineCollectorsAuxService#removeApplicationCollector. Need to check if there are other places where we are removing the app id from some map. Relevant jiras for collectorLingerPeriod YARN-3995 and YARN-7835 > Race condition when container events are published for KILLED applications > -- > > Key: YARN-8130 > URL: https://issues.apache.org/jira/browse/YARN-8130 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Reporter: Charan Hebri >Priority: Major > > There seems to be a race condition happening when an application is KILLED > and the corresponding container event information is being published. For > completed containers, a YARN_CONTAINER_FINISHED event is generated but for > some containers in a KILLED application this information is missing. Below is > a node manager log snippet, > {code:java} > 2018-04-09 08:44:54,474 INFO shuffle.ExternalShuffleBlockResolver > (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application > application_1523259757659_0003 removed, cleanupLocalDirs = false > 2018-04-09 08:44:54,478 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1523259757659_0003 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher > (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been > removed before the entity could be published for > TimelineEntity[type='YARN_CONTAINER', > id='container_1523259757659_0003_01_02'] > 2018-04-09 08:44:54,478 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just > finished : application_1523259757659_0003 > 2018-04-09 08:44:54,488 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_01. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:54,492 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_02. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:55,470 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(192)) - The collector service for > application_1523259757659_0003 was removed > 2018-04-09 08:44:55,472 INFO containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1572)) - couldn't find application > application_1523259757659_0003 while processing FINISH_APPS event. The > ResourceManager allocated resources for this application to the NodeManager > but no active containers were found to process{code} > The container id specified in the log, > *container_1523259757659_0003_01_02* is the one that has the finished > event missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration
[ https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432900#comment-16432900 ] Jonathan Hung commented on YARN-7974: - Hi [~wangda] - this is the tracking url change I mentioned during last week's meeting. Would appreciate if you could take a look if you have the chance :) > Allow updating application tracking url after registration > -- > > Key: YARN-7974 > URL: https://issues.apache.org/jira/browse/YARN-7974 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-7974.001.patch, YARN-7974.002.patch > > > Normally an application's tracking url is set on AM registration. We have a > use case for updating the tracking url after registration (e.g. the UI is > hosted on one of the containers). > Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol. > We'll post the patch soon, assuming there are no issues with this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration
[ https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432922#comment-16432922 ] Wangda Tan commented on YARN-7974: -- [~jhung], Thanks for working on the feature, I can see it's values. For implementation / API: 1) Have you considered only allowing AM to update the tracking URL? Which can solve some problems like: a. Need to properly check ACL to make the change. b. concurrent write tracking URL causes issue. 2) I think the updated tracking URL need to be persisted as well, otherwise RM restart causes update information cleared. > Allow updating application tracking url after registration > -- > > Key: YARN-7974 > URL: https://issues.apache.org/jira/browse/YARN-7974 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-7974.001.patch, YARN-7974.002.patch > > > Normally an application's tracking url is set on AM registration. We have a > use case for updating the tracking url after registration (e.g. the UI is > hosted on one of the containers). > Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol. > We'll post the patch soon, assuming there are no issues with this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services
[ https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432934#comment-16432934 ] Wangda Tan commented on YARN-7530: -- [~eyang], thanks for sharing ur thoughts. To me, for currently scope of native service, it is already beyond a single / self-contained app on YARN: 1) YARN Service API is part of RM. 2) After YARN-8048, system services can be deployed before running any other applications. I think we should move API / Client code to proper places to avoid load native service client / API logics by using reflection. This doesn't block anything for now, but I think it will be important to clean it up to get more contributions from community. > hadoop-yarn-services-api should be part of hadoop-yarn-services > --- > > Key: YARN-7530 > URL: https://issues.apache.org/jira/browse/YARN-7530 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Chandni Singh >Priority: Trivial > Fix For: yarn-native-services > > Attachments: YARN-7530.001.patch > > > Hadoop-yarn-services-api is currently a parallel project to > hadoop-yarn-services project. It would be better if hadoop-yarn-services-api > is part of hadoop-yarn-services for correctness. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7221: Attachment: YARN-7221.022.patch > Add security check for privileged docker container > -- > > Key: YARN-7221 > URL: https://issues.apache.org/jira/browse/YARN-7221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security >Affects Versions: 3.0.0, 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-7221.001.patch, YARN-7221.002.patch, > YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, > YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, > YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, > YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, > YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, > YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, > YARN-7221.021.patch, YARN-7221.022.patch > > > When a docker is running with privileges, majority of the use case is to have > some program running with root then drop privileges to another user. i.e. > httpd to start with privileged and bind to port 80, then drop privileges to > www user. > # We should add security check for submitting users, to verify they have > "sudo" access to run privileged container. > # We should remove --user=uid:gid for privileged containers. > > Docker can be launched with --privileged=true, and --user=uid:gid flag. With > this parameter combinations, user will not have access to become root user. > All docker exec command will be drop to uid:gid user to run instead of > granting privileges. User can gain root privileges if container file system > contains files that give user extra power, but this type of image is > considered as dangerous. Non-privileged user can launch container with > special bits to acquire same level of root power. Hence, we lose control of > which image should be run with --privileges, and who have sudo rights to use > privileged container images. As the result, we should check for sudo access > then decide to parameterize --privileged=true OR --user=uid:gid. This will > avoid leading developer down the wrong path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432936#comment-16432936 ] Eric Yang commented on YARN-7221: - Patch 22 rebased to current trunk. > Add security check for privileged docker container > -- > > Key: YARN-7221 > URL: https://issues.apache.org/jira/browse/YARN-7221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security >Affects Versions: 3.0.0, 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-7221.001.patch, YARN-7221.002.patch, > YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, > YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, > YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, > YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, > YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, > YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, > YARN-7221.021.patch, YARN-7221.022.patch > > > When a docker is running with privileges, majority of the use case is to have > some program running with root then drop privileges to another user. i.e. > httpd to start with privileged and bind to port 80, then drop privileges to > www user. > # We should add security check for submitting users, to verify they have > "sudo" access to run privileged container. > # We should remove --user=uid:gid for privileged containers. > > Docker can be launched with --privileged=true, and --user=uid:gid flag. With > this parameter combinations, user will not have access to become root user. > All docker exec command will be drop to uid:gid user to run instead of > granting privileges. User can gain root privileges if container file system > contains files that give user extra power, but this type of image is > considered as dangerous. Non-privileged user can launch container with > special bits to acquire same level of root power. Hence, we lose control of > which image should be run with --privileges, and who have sudo rights to use > privileged container images. As the result, we should check for sudo access > then decide to parameterize --privileged=true OR --user=uid:gid. This will > avoid leading developer down the wrong path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
[ https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432938#comment-16432938 ] Wangda Tan commented on YARN-8116: -- +1, thanks [~csingh], will commit shortly. > Nodemanager fails with NumberFormatException: For input string: "" > -- > > Key: YARN-8116 > URL: https://issues.apache.org/jira/browse/YARN-8116 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8116.001.patch, YARN-8116.002.patch > > > Steps followed. > 1) Update nodemanager debug delay config > {code} > > yarn.nodemanager.delete.debug-delay-sec > 350 > {code} > 2) Launch distributed shell application multiple times > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn jar > hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 120" > -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > hadoop-yarn-applications-distributedshell-*.jar{code} > 3) restart NM > Nodemanager fails to start with below error. > {code} > {code:title=NM log} > 2018-03-23 21:32:14,437 INFO monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: > true > 2018-03-23 21:32:14,439 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set > as 3600. The logs will be aggregated every 3600 seconds > 2018-03-23 21:32:14,455 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > failed in state INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960) > 2018-03-23 21:32:14,458 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceStop(148)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService > waiting for pending aggregation during exit > 2018-03-23 21:32:14,460 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state > INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) >
[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432970#comment-16432970 ] Wangda Tan commented on YARN-7494: -- Thanks [~sunilg], In general change looks good. Could u check UT failures? [~cheersyang] please commit the patch once you think it is ready. > Add muti node lookup support for better placement > - > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.v0.patch, YARN-7494.v1.patch, > multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8061) An application may preempt itself in case of minshare preemption
[ https://issues.apache.org/jira/browse/YARN-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-8061: -- Assignee: (was: Yufei Gu) > An application may preempt itself in case of minshare preemption > > > Key: YARN-8061 > URL: https://issues.apache.org/jira/browse/YARN-8061 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.0, 2.8.3, 3.0.0 >Reporter: Yufei Gu >Priority: Major > > Assume a leaf queue A's minshare is 10G memory and fairshare is 12G. It used > 4G, so its minshare-staved resources is 6G and will be distributed to all its > apps. Assume there are 4 apps a1, a2, a3, a4 inside, who demand 3G, 2G, 1G, > and 0.5G. a1 gets 3G minshare-starved resources, a2 gets 2G, a3 get 1G, they > are all considered as starved apps except a4 who doesn't get any. > An app can preempt another under the same queue due to minshare starvation. > For example, a1 can preempt a4 if a4 uses more resources than its fair share, > which is 3G(12G/4). If a1 itself used more than 3G memory, it will preempt > itself! I will create a unit test later. > The solution would check application's fair share while distributing minshare > starvation, more details in method > {{FSLeafQueue#updateStarvedAppsMinshare()}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6527) Provide a better out-of-the-box experience for SLS
[ https://issues.apache.org/jira/browse/YARN-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-6527: -- Assignee: (was: Yufei Gu) > Provide a better out-of-the-box experience for SLS > -- > > Key: YARN-6527 > URL: https://issues.apache.org/jira/browse/YARN-6527 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 3.0.0-alpha4 >Reporter: Robert Kanter >Priority: Major > > The example provided with SLS appears to be broken - I didn't see any jobs > running. On top of that, it seems like getting SLS to run properly requires > a lot of hadoop site configs, scheduler configs, etc. I was only able to get > something running after [~yufeigu] provided a lot of config files. > We should provide a better out-of-the-box experience for SLS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7263) Check host name resolution performance when resource manager starts up
[ https://issues.apache.org/jira/browse/YARN-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-7263: -- Assignee: (was: Yufei Gu) > Check host name resolution performance when resource manager starts up > -- > > Key: YARN-7263 > URL: https://issues.apache.org/jira/browse/YARN-7263 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: Yufei Gu >Priority: Major > > According to YARN-7207, host name resolution could be slow in some > environment, which affects RM performance in different ways. It would be nice > to check that when RM starts up and place a warning message into the logs if > the performance is not ideal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7968) Reset the queue name in submission context while recovering an application
[ https://issues.apache.org/jira/browse/YARN-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu resolved YARN-7968. Resolution: Won't Fix > Reset the queue name in submission context while recovering an application > -- > > Key: YARN-7968 > URL: https://issues.apache.org/jira/browse/YARN-7968 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Yufei Gu >Assignee: Yufei Gu >Priority: Major > > After YARN-7139, the new application can get correct queue name in its > submission context. We need to do the same thing for application recovering. > {code} > if (isAppRecovering) { > if (LOG.isDebugEnabled()) { > LOG.debug(applicationId > + " is recovering. Skip notifying APP_ACCEPTED"); > } > } else { > // During tests we do not always have an application object, handle > // it here but we probably should fix the tests > if (rmApp != null && rmApp.getApplicationSubmissionContext() != null) > { > // Before we send out the event that the app is accepted is > // to set the queue in the submissionContext (needed on restore etc) > rmApp.getApplicationSubmissionContext().setQueue(queue.getName()); > } > rmContext.getDispatcher().getEventHandler().handle( > new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED)); > } > {code} > We can do it by moving the > {{rmApp.getApplicationSubmissionContext().setQueue}} block out of the if-else > block. cc [~wilfreds]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7948) Enable refreshing maximum allocation for multiple resource types
[ https://issues.apache.org/jira/browse/YARN-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-7948: -- Assignee: Szilard Nemeth (was: Yufei Gu) > Enable refreshing maximum allocation for multiple resource types > > > Key: YARN-7948 > URL: https://issues.apache.org/jira/browse/YARN-7948 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.0.0 >Reporter: Yufei Gu >Assignee: Szilard Nemeth >Priority: Major > > YARN-7738 did the same thing for CS. We need a fix for FS. We could fix it by > moving the refresh code from class CS to class AbstractYARNScheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7347) Fixe the bug in Fair scheduler to handle a queue named "root.root"
[ https://issues.apache.org/jira/browse/YARN-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-7347: -- Assignee: Gergo Repas (was: Yufei Gu) > Fixe the bug in Fair scheduler to handle a queue named "root.root" > -- > > Key: YARN-7347 > URL: https://issues.apache.org/jira/browse/YARN-7347 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, reservation system >Reporter: Yufei Gu >Assignee: Gergo Repas >Priority: Major > > A queue named "root.root" may cause issue in Fair scheduler. For example, if > we set the queue(root.root) to be reservable, then submit a job into the > queue. We got following error. > {code} > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed > to submit application_1508176133973_0002 to YARN : root.root is not a leaf > queue > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:339) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:253) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588) > at > org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:307) > at > org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit > application_1508176133973_0002 to YARN : root.root is not a leaf queue > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:293) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:298) > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:324) > ... 25 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6324) The log4j.properties in sample-conf doesn't work well for SLS
[ https://issues.apache.org/jira/browse/YARN-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-6324: -- Assignee: (was: Yufei Gu) > The log4j.properties in sample-conf doesn't work well for SLS > - > > Key: YARN-6324 > URL: https://issues.apache.org/jira/browse/YARN-6324 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Reporter: Yufei Gu >Priority: Major > > Many log messages are missing, such as no way to find RM logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5824) Verify app starvation under custom preemption thresholds and timeouts
[ https://issues.apache.org/jira/browse/YARN-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-5824: -- Assignee: (was: Yufei Gu) > Verify app starvation under custom preemption thresholds and timeouts > - > > Key: YARN-5824 > URL: https://issues.apache.org/jira/browse/YARN-5824 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Karthik Kambatla >Priority: Major > > YARN-5783 adds basic tests to verify applications are identified to be > starved. This JIRA is to add more advanced tests for different values of > preemption thresholds and timeouts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-3890) FairScheduler should show the scheduler health metrics similar to ones added in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-3890: -- Assignee: Gergo Repas (was: Yufei Gu) > FairScheduler should show the scheduler health metrics similar to ones added > in CapacityScheduler > - > > Key: YARN-3890 > URL: https://issues.apache.org/jira/browse/YARN-3890 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Gergo Repas >Priority: Major > > We should add information displayed in YARN-3293 in FairScheduler as well > possibly sharing the implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-3797) NodeManager not blacklisting the disk (shuffle) with errors
[ https://issues.apache.org/jira/browse/YARN-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-3797: -- Assignee: (was: Yufei Gu) > NodeManager not blacklisting the disk (shuffle) with errors > --- > > Key: YARN-3797 > URL: https://issues.apache.org/jira/browse/YARN-3797 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Rajesh Balamohan >Priority: Major > > In a multi-node environment, one of the disk (where map outputs are written) > in a node went bad. Errors are given below. > {noformat} > Info fld=0x9ad090a > sd 6:0:5:0: [sdf] Add. Sense: Unrecovered read error > sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 ad 09 08 00 00 08 00 > end_request: critical medium error, dev sdf, sector 162334984 > mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x) > sd 6:0:5:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > sd 6:0:5:0: [sdf] Sense Key : Medium Error [current] > Info fld=0x9af8892 > sd 6:0:5:0: [sdf] Add. Sense: Unrecovered read error > sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 af 88 90 00 00 08 00 > end_request: critical medium error, dev sdf, sector 162498704 > mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x) > mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x) > sd 6:0:5:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > sd 6:0:5:0: [sdf] Sense Key : Medium Error [current] > Info fld=0x9af8892 > sd 6:0:5:0: [sdf] Add. Sense: Unrecovered read error > sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 af 88 90 00 00 08 00 > end_request: critical medium error, dev sdf, sector 162498704 > {noformat} > Diskchecker would pass as the system allows to create directories and delete > directories without issues. But data being served out can be corrupt and > fetchers fail during CRC verification with unwanted delays and retries. > Ideally node manager should detect such errors and blacklist/remove those > disks from NM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6941) Allow Queue placement policies to be ordered by attribute
[ https://issues.apache.org/jira/browse/YARN-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-6941: -- Assignee: (was: Yufei Gu) > Allow Queue placement policies to be ordered by attribute > - > > Key: YARN-6941 > URL: https://issues.apache.org/jira/browse/YARN-6941 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Yufei Gu >Priority: Minor > > It would be nice to add a feature that would allow users to provide an > "order" or "index" the placement policies should apply, rather than just the > native policy order as included in the XML. > For instance, the following two examples would be the same: > Natural order: > > > > > > Indexed Order: > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6971) Clean up different ways to create resources
[ https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-6971: -- Assignee: (was: Yufei Gu) > Clean up different ways to create resources > --- > > Key: YARN-6971 > URL: https://issues.apache.org/jira/browse/YARN-6971 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Yufei Gu >Priority: Minor > Labels: newbie > > There are several ways to create a {{resource}} object, e.g., > BuilderUtils.newResource() and Resources.createResource(). These methods not > only cause confusing but also performance issues, for example > BuilderUtils.newResource() is significant slow than > Resources.createResource(). > We could merge them some how, and replace most BuilderUtils.newResource() > with Resources.createResource(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6925) FSSchedulerNode could be simplified extracting preemption fields into a class
[ https://issues.apache.org/jira/browse/YARN-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-6925: -- Assignee: (was: Yufei Gu) > FSSchedulerNode could be simplified extracting preemption fields into a class > - > > Key: YARN-6925 > URL: https://issues.apache.org/jira/browse/YARN-6925 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Miklos Szegedi >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior
[ https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433038#comment-16433038 ] Eric Yang commented on YARN-7984: - [~billie.rinaldi] +1 to commit. Patch 2 looks good to me. Stop and destroy command works better with the new error handling. > Delete registry entries from ZK on ServiceClient stop and clean up > stop/destroy behavior > > > Key: YARN-7984 > URL: https://issues.apache.org/jira/browse/YARN-7984 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-7984.1.patch, YARN-7984.2.patch > > > The service records written to the registry are removed by ServiceClient on a > destroy call, but not on a stop call. The service AM does have some code to > clean up the registry entries when component instances are stopped, but if > the AM is killed before it has a chance to perform the cleanup, these entries > will be left in ZooKeeper. It would be better to clean these up in the stop > call, so that RegistryDNS does not provide lookups for containers that don't > exist. > Additional stop/destroy behavior improvements include fixing errors / > unexpected behavior related to: > * destroying a saved (not launched or started) service > * destroying a stopped service > * destroying a destroyed service > * returning proper exit codes for destroy failures > * performing other client operations on saved services (fixing NPEs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service
Yesha Vora created YARN-8140: Summary: Improve log message when launch cmd is ran for stopped yarn service Key: YARN-8140 URL: https://issues.apache.org/jira/browse/YARN-8140 Project: Hadoop YARN Issue Type: Improvement Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Yesha Vora Steps: 1) Launch sleeper app {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch sleeper2-duplicate-app-stopped /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: application_1523387473707_0007 Exit Code: 0\{code} 2) Stop the application {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop sleeper2-duplicate-app-stopped WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service sleeper2-duplicate-app-stopped Exit Code: 0\{code} 3) Launch the application with same name {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch sleeper2-duplicate-app-stopped /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms 18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already exists: hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json Exit Code: 56 {code} Here, launch cmd fails with "Service Instance dir already exists: hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json". The log message should be more meaningful. It should return that "sleeper2-duplicate-app-stopped is in stopped state". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service
[ https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8140: - Description: Steps: 1) Launch sleeper app {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch sleeper2-duplicate-app-stopped /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: application_1523387473707_0007 Exit Code: 0{code} 2) Stop the application {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop sleeper2-duplicate-app-stopped WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service sleeper2-duplicate-app-stopped Exit Code: 0{code} 3) Launch the application with same name {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch sleeper2-duplicate-app-stopped /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms 18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already exists: hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json Exit Code: 56 {code} Here, launch cmd fails with "Service Instance dir already exists: hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json". The log message should be more meaningful. It should return that "sleeper2-duplicate-app-stopped is in stopped state". was: Steps: 1) Launch sleeper app {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch sleeper2-duplicate-app-stopped /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History server at xx:10200 18/04
[jira] [Commented] (YARN-8037) CGroupsResourceCalculator logs excessive warnings on container relaunch
[ https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433049#comment-16433049 ] Shane Kumpf commented on YARN-8037: --- Thanks [~miklos.szeg...@cloudera.com] - For most applications I only see a single exception for each of the subsystems, like the output above, so I'm not sure that will address a bulk of these. I have a few ideas to test out and I'll report back soon with more detail. > CGroupsResourceCalculator logs excessive warnings on container relaunch > --- > > Key: YARN-8037 > URL: https://issues.apache.org/jira/browse/YARN-8037 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Major > > When a container is relaunched, the old process no longer exists. When using > the {{CGroupsResourceCalculator}} this results in the warning and exception > below being logged every second until the relaunch occurs, which is excessive > and filling up the logs. > {code:java} > 2018-03-16 14:30:33,438 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator: > Failed to parse 12844 > org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the > interim 12844 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457) > Caused by: java.io.FileNotFoundException: > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_02/cpuacct.stat > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.(FileInputStream.java:138) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320) > ... 4 more > 2018-03-16 14:30:33,438 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator: > Failed to parse cgroups > /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.memsw.usage_in_bytes > org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the > interim 12844 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457) > Caused by: java.io.FileNotFoundException: > /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.usage_in_bytes > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.(FileInputStream.java:138) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320) > ... 4 more{code} > We should consider moving the exception to debug to reduce the noise at a > minimum. Alternatively, it may make sense to stop the existing > {{MonitoringThread}} during relaunch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
Wangda Tan created YARN-8141: Summary: YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec Key: YARN-8141 URL: https://issues.apache.org/jira/browse/YARN-8141 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Wangda Tan Existing YARN native service overwrites YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user specified this in service spec or not. It is important to allow user to mount local folders like /etc/passwd, etc. Following logic overwrites the YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: {code:java} StringBuilder sb = new StringBuilder(); for (Entry mount : mountPaths.entrySet()) { if (sb.length() > 0) { sb.append(","); } sb.append(mount.getKey()); sb.append(":"); sb.append(mount.getValue()); } env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", sb.toString());{code} Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior
[ https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433062#comment-16433062 ] Hudson commented on YARN-7984: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13959 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13959/]) YARN-7984. Improved YARN service stop/destroy and clean up.(eyang: rev d553799030a5a64df328319aceb35734d0b2de20) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/test/java/org/apache/hadoop/yarn/service/ServiceClientTest.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestYarnNativeServices.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/client/ServiceClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/main/java/org/apache/hadoop/yarn/service/webapp/ApiServer.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/test/java/org/apache/hadoop/yarn/service/TestApiServer.java > Delete registry entries from ZK on ServiceClient stop and clean up > stop/destroy behavior > > > Key: YARN-7984 > URL: https://issues.apache.org/jira/browse/YARN-7984 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-7984.1.patch, YARN-7984.2.patch > > > The service records written to the registry are removed by ServiceClient on a > destroy call, but not on a stop call. The service AM does have some code to > clean up the registry entries when component instances are stopped, but if > the AM is killed before it has a chance to perform the cleanup, these entries > will be left in ZooKeeper. It would be better to clean these up in the stop > call, so that RegistryDNS does not provide lookups for containers that don't > exist. > Additional stop/destroy behavior improvements include fixing errors / > unexpected behavior related to: > * destroying a saved (not launched or started) service > * destroying a stopped service > * destroying a destroyed service > * returning proper exit codes for destroy failures > * performing other client operations on saved services (fixing NPEs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration
[ https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433067#comment-16433067 ] Jonathan Hung commented on YARN-7974: - Thanks for the comments, For 1 I think this is possible, initially I didn't want to add more APIs to ApplicationMasterProtocol, but I think we can add a field in AllocateRequestProto, and just update the url on next call to allocate() to avoid overcomplicating the protocol. For 2 I will make this change. > Allow updating application tracking url after registration > -- > > Key: YARN-7974 > URL: https://issues.apache.org/jira/browse/YARN-7974 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-7974.001.patch, YARN-7974.002.patch > > > Normally an application's tracking url is set on AM registration. We have a > use case for updating the tracking url after registration (e.g. the UI is > hosted on one of the containers). > Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol. > We'll post the patch soon, assuming there are no issues with this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services
[ https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433071#comment-16433071 ] Eric Yang commented on YARN-7530: - [~leftnoteasy] YARN service have dependency setup backward because precursor SLIDER was designed to run in YARN. Instead of server dependent on client. ServiceClient depends on hadoop-yarn-service-core, and hadoop-yarn-server-common. Therefore, it might be problematic to move hadoop-yarn-services-core to yarn common. You are welcome to try, but it would be good to keep some of Yarn Service Application Master as a piece that is build after yarn client + yarn servers to avoid circular dependencies. > hadoop-yarn-services-api should be part of hadoop-yarn-services > --- > > Key: YARN-7530 > URL: https://issues.apache.org/jira/browse/YARN-7530 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Chandni Singh >Priority: Trivial > Fix For: yarn-native-services > > Attachments: YARN-7530.001.patch > > > Hadoop-yarn-services-api is currently a parallel project to > hadoop-yarn-services project. It would be better if hadoop-yarn-services-api > is part of hadoop-yarn-services for correctness. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433080#comment-16433080 ] genericqa commented on YARN-7221: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 1s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 56s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 53s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 87m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-7221 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918457/YARN-7221.022.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux 1152cddcffed 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8ab776d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/20293/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20293/testReport/ | | Max. process+thread count | 301 (vs. ulimit of 1)
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433081#comment-16433081 ] Shane Kumpf commented on YARN-8141: --- Thanks for reporting this, [~leftnoteasy] - {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} is intended to be used for the purpose you call out. When that variable was added, {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} was retained due to its existing use in native services. Is there a case where {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} won't work for your need? Maybe it is time we do look to consolidate these two. > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Priority: Critical > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433081#comment-16433081 ] Shane Kumpf edited comment on YARN-8141 at 4/10/18 10:19 PM: - Thanks for reporting this, [~leftnoteasy] - {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} is intended to be used for the purpose you call out. When that variable was added, {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} was retained due to its existing use in native services. Is there a case where {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} won't work for your need? Maybe it is time we look to consolidate these two. was (Author: shaneku...@gmail.com): Thanks for reporting this, [~leftnoteasy] - {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} is intended to be used for the purpose you call out. When that variable was added, {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} was retained due to its existing use in native services. Is there a case where {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} won't work for your need? Maybe it is time we do look to consolidate these two. > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Priority: Critical > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service
[ https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433083#comment-16433083 ] Eric Yang commented on YARN-8140: - [~yeshavora] This is because the application has not been destroyed. The subsequence launch failed because the duplicate service name is in use. I think the message is not wrong to indicate to the user that hadoop can't deploy with the same name. If the error message is returned with sleeper2-duplicate-app-stopped is in stopped state. User may be mistaken that second service of the same name is persisted in Hadoop. This is not the case, therefore, existing message is more concise in delivering the message. We can change the message to "Service name sleeper2-duplicate-app-stopped is already taken: hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json". Will this work? > Improve log message when launch cmd is ran for stopped yarn service > --- > > Key: YARN-8140 > URL: https://issues.apache.org/jira/browse/YARN-8140 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Priority: Major > > Steps: > 1) Launch sleeper app > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > sleeper2-duplicate-app-stopped > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition > from local FS: > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms > 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: > application_1523387473707_0007 > Exit Code: 0{code} > 2) Stop the application > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop > sleeper2-duplicate-app-stopped > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms > 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service > sleeper2-duplicate-app-stopped > Exit Code: 0{code} > 3) Launch the application with same name > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > sleeper2-duplicate-app-stopped > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition > from local FS: > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms > 18/04/10 21:31
[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.
[ https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433090#comment-16433090 ] Eric Payne commented on YARN-4781: -- bq. Hence in case we use sizeBasedWeight, we are considering pending as well. So i had this doubt.. I see what you mean. Good catch. I was not considering the {{sizeBasedWeight}} case. My first thought was to just use the {{FairOrderingPolicy#FairComparator}}, but that is for {{SchedulableEntity}}s like {{FiCaschedulingApp}}, and the {{PriorityQueue}}s in {{FifoIntraQueuePreemptionPlugin}} are sorting {{TempAppPerPartition}}s, so I wouldn't be able to combine this feature with the {{FifoIntraQueuePreemptionPlugin}}. It may be worthwhile to go back to your previous suggestion about splitting out common functionality into an abstract {{AbstractIntraQueuePreemptionPlugin}} class and sub-classing FiFo and Fair puligins. > Support intra-queue preemption for fairness ordering policy. > > > Key: YARN-4781 > URL: https://issues.apache.org/jira/browse/YARN-4781 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-4781.001.patch, YARN-4781.002.patch, > YARN-4781.003.patch > > > We introduced fairness queue policy since YARN-3319, which will let large > applications make progresses and not starve small applications. However, if a > large application takes the queue’s resources, and containers of the large > app has long lifespan, small applications could still wait for resources for > long time and SLAs cannot be guaranteed. > Instead of wait for application release resources on their own, we need to > preempt resources of queue with fairness policy enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service
[ https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433093#comment-16433093 ] Yesha Vora commented on YARN-8140: -- yes [~eyang] this message sounds good. > Improve log message when launch cmd is ran for stopped yarn service > --- > > Key: YARN-8140 > URL: https://issues.apache.org/jira/browse/YARN-8140 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Priority: Major > > Steps: > 1) Launch sleeper app > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > sleeper2-duplicate-app-stopped > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition > from local FS: > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms > 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: > application_1523387473707_0007 > Exit Code: 0{code} > 2) Stop the application > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop > sleeper2-duplicate-app-stopped > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms > 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service > sleeper2-duplicate-app-stopped > Exit Code: 0{code} > 3) Launch the application with same name > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > sleeper2-duplicate-app-stopped > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition > from local FS: > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms > 18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already > exists: > hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json > Exit Code: 56 > {code} > > Here, launch cmd fails with "Service Instance dir already exists: > hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json". > > The log message should be more meaningful. It should return that > "sleeper2-duplicate-app-stopped is in stopped state". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...
[jira] [Updated] (YARN-8142) yarn service application stops when AM is killed
[ https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8142: - Description: Steps: 1) Launch sleeper job ( non-docker yarn service) {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch fault-test-am-sleeper /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History server at xxx:10200 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History server at xxx:10200 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: application_1522887500374_0010 Exit Code: 0{code} 2) Wait for sleeper component to be up 3) Kill AM process PID Expected behavior: New attempt of AM will be started. The pre-existing container will keep running Actual behavior: Application finishes with State : FINISHED and Final-State : ENDED New attempt was never launched Note: when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the entire app down instead of letting it continue to run for another attempt was: Steps: 1) Launch sleeper job ( non-docker yarn service) {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch fault-test-am-sleeper /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History server at xxx:10200 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History server at xxx:10200 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: application_1522887500374_0010 Exit Code: 0\{code} 2) Wait for sleeper component to be up 3) Kill AM process PID Expected behavior: New attempt of AM will be started. The pre-existing container will keep running Actual behavior: Application finishes with State : FINISHED and Final-State : ENDED New attempt was never launched Note: when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the entire app down instead of letting it continue to run for another attempt > yarn service application stops when AM is killed > > > Key: YARN-8142 > URL: https://issues.apache.org/jira/browse/YARN-8142 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Priority: Major > > Steps: > 1) Launch sleeper job ( non-docker yarn service) > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > fault-test-am-sleeper > /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History > server at xxx:10200 > 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History > s
[jira] [Created] (YARN-8142) yarn service application stops when AM is killed
Yesha Vora created YARN-8142: Summary: yarn service application stops when AM is killed Key: YARN-8142 URL: https://issues.apache.org/jira/browse/YARN-8142 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora Steps: 1) Launch sleeper job ( non-docker yarn service) {code} RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch fault-test-am-sleeper /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History server at xxx:10200 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History server at xxx:10200 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from local FS: /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: application_1522887500374_0010 Exit Code: 0\{code} 2) Wait for sleeper component to be up 3) Kill AM process PID Expected behavior: New attempt of AM will be started. The pre-existing container will keep running Actual behavior: Application finishes with State : FINISHED and Final-State : ENDED New attempt was never launched Note: when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the entire app down instead of letting it continue to run for another attempt -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8142) yarn service application stops when AM is killed with SIGTERM
[ https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8142: - Summary: yarn service application stops when AM is killed with SIGTERM (was: yarn service application stops when AM is killed) > yarn service application stops when AM is killed with SIGTERM > - > > Key: YARN-8142 > URL: https://issues.apache.org/jira/browse/YARN-8142 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Priority: Major > > Steps: > 1) Launch sleeper job ( non-docker yarn service) > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > fault-test-am-sleeper > /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History > server at xxx:10200 > 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History > server at xxx:10200 > 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition > from local FS: > /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json > 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms > 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: > application_1522887500374_0010 > Exit Code: 0{code} > 2) Wait for sleeper component to be up > 3) Kill AM process PID > > Expected behavior: > New attempt of AM will be started. The pre-existing container will keep > running > > Actual behavior: > Application finishes with State : FINISHED and Final-State : ENDED > New attempt was never launched > Note: > when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting > the entire app down instead of letting it continue to run for another attempt > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7189: -- Attachment: YARN-7189-b3.0.001.patch > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433137#comment-16433137 ] Eric Badger commented on YARN-7189: --- Attaching first patch to fix this issue. There is a race in the removal of the docker container where the pid may not be valid anymore (no such process), but the docker container is still in the running state. Because of that, I have added an exponential backoff of removal in this patch. It will try for 5 iterations of increasing sleep times and eventually give up after the last one. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433162#comment-16433162 ] Eric Yang commented on YARN-7221: - TestContainerSchedulerQueuing unit test failure is not related to changes in this patch. > Add security check for privileged docker container > -- > > Key: YARN-7221 > URL: https://issues.apache.org/jira/browse/YARN-7221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security >Affects Versions: 3.0.0, 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-7221.001.patch, YARN-7221.002.patch, > YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, > YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, > YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, > YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, > YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, > YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, > YARN-7221.021.patch, YARN-7221.022.patch > > > When a docker is running with privileges, majority of the use case is to have > some program running with root then drop privileges to another user. i.e. > httpd to start with privileged and bind to port 80, then drop privileges to > www user. > # We should add security check for submitting users, to verify they have > "sudo" access to run privileged container. > # We should remove --user=uid:gid for privileged containers. > > Docker can be launched with --privileged=true, and --user=uid:gid flag. With > this parameter combinations, user will not have access to become root user. > All docker exec command will be drop to uid:gid user to run instead of > granting privileges. User can gain root privileges if container file system > contains files that give user extra power, but this type of image is > considered as dangerous. Non-privileged user can launch container with > special bits to acquire same level of root power. Hence, we lose control of > which image should be run with --privileges, and who have sudo rights to use > privileged container images. As the result, we should check for sudo access > then decide to parameterize --privileged=true OR --user=uid:gid. This will > avoid leading developer down the wrong path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7973) Support ContainerRelaunch for Docker containers
[ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7973: Fix Version/s: 3.2.0 > Support ContainerRelaunch for Docker containers > --- > > Key: YARN-7973 > URL: https://issues.apache.org/jira/browse/YARN-7973 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7973.001.patch, YARN-7973.002.patch, > YARN-7973.003.patch, YARN-7973.004.patch > > > Prior to YARN-5366, {{container-executor}} would remove the Docker container > when it exited. The removal is now handled by the > {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse > the workdir from the previous attempt, and does not call {{cleanupContainer}} > prior to {{launchContainer}}. The container ID is reused as well. As a > result, the previous Docker container still exists, resulting in an error > from Docker indicating the a container by that name already exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior
[ https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7984: Fix Version/s: 3.2.0 > Delete registry entries from ZK on ServiceClient stop and clean up > stop/destroy behavior > > > Key: YARN-7984 > URL: https://issues.apache.org/jira/browse/YARN-7984 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Critical > Fix For: 3.2.0 > > Attachments: YARN-7984.1.patch, YARN-7984.2.patch > > > The service records written to the registry are removed by ServiceClient on a > destroy call, but not on a stop call. The service AM does have some code to > clean up the registry entries when component instances are stopped, but if > the AM is killed before it has a chance to perform the cleanup, these entries > will be left in ZooKeeper. It would be better to clean these up in the stop > call, so that RegistryDNS does not provide lookups for containers that don't > exist. > Additional stop/destroy behavior improvements include fixing errors / > unexpected behavior related to: > * destroying a saved (not launched or started) service > * destroying a stopped service > * destroying a destroyed service > * returning proper exit codes for destroy failures > * performing other client operations on saved services (fixing NPEs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8142) yarn service application stops when AM is killed with SIGTERM
[ https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi reassigned YARN-8142: Assignee: Billie Rinaldi > yarn service application stops when AM is killed with SIGTERM > - > > Key: YARN-8142 > URL: https://issues.apache.org/jira/browse/YARN-8142 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Billie Rinaldi >Priority: Major > > Steps: > 1) Launch sleeper job ( non-docker yarn service) > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > fault-test-am-sleeper > /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History > server at xxx:10200 > 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History > server at xxx:10200 > 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition > from local FS: > /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json > 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms > 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: > application_1522887500374_0010 > Exit Code: 0{code} > 2) Wait for sleeper component to be up > 3) Kill AM process PID > > Expected behavior: > New attempt of AM will be started. The pre-existing container will keep > running > > Actual behavior: > Application finishes with State : FINISHED and Final-State : ENDED > New attempt was never launched > Note: > when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting > the entire app down instead of letting it continue to run for another attempt > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7781) Update YARN-Services-Examples.md to be in sync with the latest code
[ https://issues.apache.org/jira/browse/YARN-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433185#comment-16433185 ] Jian He commented on YARN-7781: --- sure, go ahead. thanks > Update YARN-Services-Examples.md to be in sync with the latest code > --- > > Key: YARN-7781 > URL: https://issues.apache.org/jira/browse/YARN-7781 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Jian He >Priority: Major > Attachments: YARN-7781.01.patch, YARN-7781.02.patch, > YARN-7781.03.patch > > > Update YARN-Services-Examples.md to make the following additions/changes: > 1. Add an additional URL and PUT Request JSON to support flex: > Update to flex up/down the no of containers (instances) of a component of a > service > PUT URL – http://localhost:8088/app/v1/services/hello-world > PUT Request JSON > {code} > { > "components" : [ { > "name" : "hello", > "number_of_containers" : 3 > } ] > } > {code} > 2. Modify all occurrences of /ws/ to /app/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers
[ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433188#comment-16433188 ] Hudson commented on YARN-7973: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13962 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13962/]) YARN-7973. Added ContainerRelaunch feature for Docker containers. (eyang: rev c467f311d0c7155c09052d93fac12045af925583) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerCommandExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerRelaunch.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitorResourceChange.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerStartCommand.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DefaultLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerStartCommand.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DelegatingLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.h * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerRelaunch.java > Support ContainerRelaunch for Docker containers > --- > > Key: YARN-7973 > URL: https://issues.apache.org/jira/browse/YARN-7973 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Shane Kumpf >
[jira] [Commented] (YARN-8104) Add API to fetch node to attribute mapping
[ https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433193#comment-16433193 ] Naganarasimha G R commented on YARN-8104: - Thanks for the patch [~bibinchundatt] At high level * Can you inform why NodeToAttributesProto is moved from yarn_server_resourcemanager_service_protos.proto to yarn_protos.proto ? * Also would it make sense to provide overloaded method(getNodesToAttributes) here supporting for NodeID ? and few other comments : yarn_protos.protos * ln no 391: node => hostname ... similar to earlier naming convention yarn_service_protos.protos * ln no 282: nodeToAttributes => nodesToAttributes ... based on convention followed in other places GetNodesToAttributesRequest.java * line no 55 & 64 : setNodes & getNodes -> setHostNames & getHostNames GetNodesToAttributesRequestPBImpl * line no 120: initNodeAttributes => initNodesToAttributesRequest or just init * line no 126: nodeLabelsList => hostNamesList TestPBImplRecords * We need to invoke generateByNewInstance for all the new PB's in setup. can you please check. NodeAttributesManagerImpl * ln no 454-457: Here we do not have a mapping we are setting a hostname with empty set, is that better or just pass for the ones which have attributes is better? IMO for the ones having mapping is better so that we not bloating the response and its a map. Also if nonexistent of erroneous hostnames are given it still shows empty set TestClientRMService * ln no 2053: can we have a separate method to test node to attributes API ? or document what all api's will be tested. and rename it to testNodeAttributesQueryAPI... i would still prefer the former option itself though.. Can you also check the new findbug issue reported, checktyle and as well as the javadoc issues reported, as it seems related to patch and fixable ? > Add API to fetch node to attribute mapping > -- > > Key: YARN-8104 > URL: https://issues.apache.org/jira/browse/YARN-8104 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8104-YARN-3409.001.patch, > YARN-8104-YARN-3409.002.patch, YARN-8104-YARN-3409.003.patch > > > Add node/host to attribute mapping in yarn client API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior
[ https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433207#comment-16433207 ] Billie Rinaldi commented on YARN-7984: -- Thanks, [~eyang]! I plan to cherry-pick this to branch-3.1 as well. > Delete registry entries from ZK on ServiceClient stop and clean up > stop/destroy behavior > > > Key: YARN-7984 > URL: https://issues.apache.org/jira/browse/YARN-7984 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Critical > Fix For: 3.2.0 > > Attachments: YARN-7984.1.patch, YARN-7984.2.patch > > > The service records written to the registry are removed by ServiceClient on a > destroy call, but not on a stop call. The service AM does have some code to > clean up the registry entries when component instances are stopped, but if > the AM is killed before it has a chance to perform the cleanup, these entries > will be left in ZooKeeper. It would be better to clean these up in the stop > call, so that RegistryDNS does not provide lookups for containers that don't > exist. > Additional stop/destroy behavior improvements include fixing errors / > unexpected behavior related to: > * destroying a saved (not launched or started) service > * destroying a stopped service > * destroying a destroyed service > * returning proper exit codes for destroy failures > * performing other client operations on saved services (fixing NPEs) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8126) [Follow up] Support auto-spawning of admin configured services during bootstrap of rm
[ https://issues.apache.org/jira/browse/YARN-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433208#comment-16433208 ] Gour Saha commented on YARN-8126: - I think this deserves to be a new sub-topic "System Services" on the left panel under "Service Discovery" (in the "YARN Service" section). It might seem that there is not much information for it to go to a new page, but there are primarily 2 reasons I am inclining towards it - # An ordinary end-user cannot create or add services to be started as system-services. So it should not be in the existing pages which focuses on what ordinary end-users can do. Hence in this new page, we should specifically call out that this is a cluster admin feature. # This is a pretty handy feature and going forward this page might grow as we add more system-service related features or add helpful system-services to the framework itself and would also need documentation to go with it. What do you think? > [Follow up] Support auto-spawning of admin configured services during > bootstrap of rm > - > > Key: YARN-8126 > URL: https://issues.apache.org/jira/browse/YARN-8126 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-8126.001.patch > > > YARN-8048 adds support auto-spawning of admin configured services during > bootstrap of rm. > This JIRA is to follow up some of the comments discussed in YARN-8048. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8142) yarn service application stops when AM is killed with SIGTERM
[ https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433210#comment-16433210 ] Eric Yang commented on YARN-8142: - In Unix terms, SIGTERM is used for terminating application. My impression this is correct behavior rather than start another instance. If other signal is used, then spawning another instance might be the right thing to do. > yarn service application stops when AM is killed with SIGTERM > - > > Key: YARN-8142 > URL: https://issues.apache.org/jira/browse/YARN-8142 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Billie Rinaldi >Priority: Major > > Steps: > 1) Launch sleeper job ( non-docker yarn service) > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > fault-test-am-sleeper > /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History > server at xxx:10200 > 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History > server at xxx:10200 > 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition > from local FS: > /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json > 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms > 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: > application_1522887500374_0010 > Exit Code: 0{code} > 2) Wait for sleeper component to be up > 3) Kill AM process PID > > Expected behavior: > New attempt of AM will be started. The pre-existing container will keep > running > > Actual behavior: > Application finishes with State : FINISHED and Final-State : ENDED > New attempt was never launched > Note: > when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting > the entire app down instead of letting it continue to run for another attempt > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8142) yarn service application stops when AM is killed with SIGTERM
[ https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433210#comment-16433210 ] Eric Yang edited comment on YARN-8142 at 4/11/18 12:07 AM: --- In Unix terms, SIGTERM is used for terminating application. My impression this is correct behavior rather than start another instance. If other signal is used (besides SIGKILL and SIGTERM), then spawning another instance might be the right thing to do. was (Author: eyang): In Unix terms, SIGTERM is used for terminating application. My impression this is correct behavior rather than start another instance. If other signal is used, then spawning another instance might be the right thing to do. > yarn service application stops when AM is killed with SIGTERM > - > > Key: YARN-8142 > URL: https://issues.apache.org/jira/browse/YARN-8142 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Billie Rinaldi >Priority: Major > > Steps: > 1) Launch sleeper job ( non-docker yarn service) > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > fault-test-am-sleeper > /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History > server at xxx:10200 > 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History > server at xxx:10200 > 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition > from local FS: > /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json > 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms > 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: > application_1522887500374_0010 > Exit Code: 0{code} > 2) Wait for sleeper component to be up > 3) Kill AM process PID > > Expected behavior: > New attempt of AM will be started. The pre-existing container will keep > running > > Actual behavior: > Application finishes with State : FINISHED and Final-State : ENDED > New attempt was never launched > Note: > when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting > the entire app down instead of letting it continue to run for another attempt > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service
[ https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-8140: --- Assignee: Eric Yang > Improve log message when launch cmd is ran for stopped yarn service > --- > > Key: YARN-8140 > URL: https://issues.apache.org/jira/browse/YARN-8140 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > > Steps: > 1) Launch sleeper app > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > sleeper2-duplicate-app-stopped > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition > from local FS: > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms > 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: > application_1523387473707_0007 > Exit Code: 0{code} > 2) Stop the application > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop > sleeper2-duplicate-app-stopped > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms > 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service > sleeper2-duplicate-app-stopped > Exit Code: 0{code} > 3) Launch the application with same name > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch > sleeper2-duplicate-app-stopped > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History > server at xx:10200 > 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition > from local FS: > /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json > 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms > 18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already > exists: > hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json > Exit Code: 56 > {code} > > Here, launch cmd fails with "Service Instance dir already exists: > hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json". > > The log message should be more meaningful. It should return that > "sleeper2-duplicate-app-stopped is in stopped state". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional c
[jira] [Commented] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433225#comment-16433225 ] Gour Saha commented on YARN-8133: - Thanks [~rohithsharma]. 002 patch looks good. +1 for commit. > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-8133.01.patch, YARN-8133.02.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433225#comment-16433225 ] Gour Saha edited comment on YARN-8133 at 4/11/18 12:28 AM: --- Thanks [~rohithsharma]. 02 patch looks good. +1 for commit. was (Author: gsaha): Thanks [~rohithsharma]. 002 patch looks good. +1 for commit. > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-8133.01.patch, YARN-8133.02.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8133: - Fix Version/s: 3.2.0 > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8133.01.patch, YARN-8133.02.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8143) Improve log message when Capacity Scheduler request allocation on node
Zian Chen created YARN-8143: --- Summary: Improve log message when Capacity Scheduler request allocation on node Key: YARN-8143 URL: https://issues.apache.org/jira/browse/YARN-8143 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Zian Chen Assignee: Zian Chen When scheduler request allocates container on the node with reserved containers on it, this log message will print very frequently which needs to be improved with more condition checks. {code:java} 2018-02-02 11:41:13,105 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted 2018-02-02 11:41:13,115 INFO capacity.CapacityScheduler (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to fulfill reservation for application application_1517571510094_0003 on node: ctr-e137-1514896590304-52728-01-07.hwx.site:25454 2018-02-02 11:41:13,115 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - Reserved container application=application_1517571510094_0003 resource= queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e cluster= {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
[ https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433249#comment-16433249 ] Hudson commented on YARN-8116: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13963 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13963/]) YARN-8116. Nodemanager fails with NumberFormatException: For input (wangda: rev 2bf9cc2c73944c9f7cde56714b8cf6995cfa539b) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java > Nodemanager fails with NumberFormatException: For input string: "" > -- > > Key: YARN-8116 > URL: https://issues.apache.org/jira/browse/YARN-8116 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Chandni Singh >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8116.001.patch, YARN-8116.002.patch > > > Steps followed. > 1) Update nodemanager debug delay config > {code} > > yarn.nodemanager.delete.debug-delay-sec > 350 > {code} > 2) Launch distributed shell application multiple times > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn jar > hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 120" > -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > hadoop-yarn-applications-distributedshell-*.jar{code} > 3) restart NM > Nodemanager fails to start with below error. > {code} > {code:title=NM log} > 2018-03-23 21:32:14,437 INFO monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: > true > 2018-03-23 21:32:14,439 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set > as 3600. The logs will be aggregated every 3600 seconds > 2018-03-23 21:32:14,455 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > failed in state INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960) > 2018-03-23 21:32:14,458 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceStop(148)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService > waiting for pending aggregation during exit > 2018-03-23 21:32:14,460 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state > INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.se