[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-4599: - Attachment: YARN-4599.015.patch > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, > YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, > YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, > YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, > YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, > YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.015.patch, > YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8302) ATS v2 should handle HBase connection issue properly
[ https://issues.apache.org/jira/browse/YARN-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S resolved YARN-8302. - Resolution: Won't Fix Closing the JIRA as won't fix since it is related to configuration issue. Decreasing hbase client timeout can be tuned with above commented configurations. > ATS v2 should handle HBase connection issue properly > > > Key: YARN-8302 > URL: https://issues.apache.org/jira/browse/YARN-8302 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Priority: Major > > ATS v2 call times out with below error when it can't connect to HBase > instance. > {code} > bash-4.2$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 'Accept: > application/json' --max-time 5 --negotiate -u : > 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092' > curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received > {code} > {code:title=ATS log} > 2018-05-15 23:10:03,623 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, > retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1 > 2018-05-15 23:10:13,651 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, > retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1 > 2018-05-15 23:10:23,730 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, > retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1 > 2018-05-15 23:10:33,788 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, > retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1{code} > There are two issues here. > 1) Check why ATS can't connect to HBase > 2) In case of connection error, ATS call should not get timeout. It should > fail with proper error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8302) ATS v2 should handle HBase connection issue properly
[ https://issues.apache.org/jira/browse/YARN-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478532#comment-16478532 ] Rohith Sharma K S commented on YARN-8302: - If HBase is down for any reasons, hbase client will retry for 20 minutes with default configurations loaded. Reducing the default value for *hbase.client.retries.number* from 15 to 7, decreased drastically from 20 minutes to 1.5 minutes. > ATS v2 should handle HBase connection issue properly > > > Key: YARN-8302 > URL: https://issues.apache.org/jira/browse/YARN-8302 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Priority: Major > > ATS v2 call times out with below error when it can't connect to HBase > instance. > {code} > bash-4.2$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 'Accept: > application/json' --max-time 5 --negotiate -u : > 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092' > curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received > {code} > {code:title=ATS log} > 2018-05-15 23:10:03,623 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, > retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1 > 2018-05-15 23:10:13,651 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, > retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1 > 2018-05-15 23:10:23,730 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, > retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1 > 2018-05-15 23:10:33,788 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, > retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1{code} > There are two issues here. > 1) Check why ATS can't connect to HBase > 2) In case of connection error, ATS call should not get timeout. It should > fail with proper error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478490#comment-16478490 ] genericqa commented on YARN-4599: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 33m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 25s{color} | {color:green} root: The patch generated 0 new + 235 unchanged - 1 fixed = 235 total (was 236) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}174m 38s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}377m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestPread | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-4599 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923724/YAR
[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-4599: - Attachment: YARN-4599.014.patch > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, > YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, > YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, > YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, > YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, > YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.sandflee.patch, > yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-4599: - Attachment: YARN-4599.013.patch > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, > YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, > YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, > YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, > YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, > YARN-4599.013.patch, YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8234) Improve RM system metrics publisher's performance by pushing events to timeline server in batch
[ https://issues.apache.org/jira/browse/YARN-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478467#comment-16478467 ] Hu Ziqian commented on YARN-8234: - Hi [~leftnoteasy], hadoop.yarn.server.resourcemanager.TestClientRMTokens hadoop.yarn.server.resourcemanager.TestAMAuthorization hadoop.yarn.server.resourcemanager.scheduler.fair.TestSchedulingPolicy these test can passed in my laptop, i have no idea why it failed in genericqa. And i checked these tests, none of them using SystemMetricsPublisher, which changed in my patch. Other tests and findbugs are all fixed. > Improve RM system metrics publisher's performance by pushing events to > timeline server in batch > --- > > Key: YARN-8234 > URL: https://issues.apache.org/jira/browse/YARN-8234 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, timelineserver >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Attachments: YARN-8234-branch-2.8.3.001.patch, > YARN-8234-branch-2.8.3.002.patch, YARN-8234-branch-2.8.3.003.patch > > > When system metrics publisher is enabled, RM will push events to timeline > server via restful api. If the cluster load is heavy, many events are sent to > timeline server and the timeline server's event handler thread locked. > YARN-7266 talked about the detail of this problem. Because of the lock, > timeline server can't receive event as fast as it generated in RM and lots of > timeline event stays in RM's memory. Finally, those events will consume all > RM's memory and RM will start a full gc (which cause an JVM stop-world and > cause a timeout from rm to zookeeper) or even get an OOM. > The main problem here is that timeline can't receive timeline server's event > as fast as it generated. Now, RM system metrics publisher put only one event > in a request, and most time costs on handling http header or some thing about > the net connection on timeline side. Only few time is spent on dealing with > the timeline event which is truly valuable. > In this issue, we add a buffer in system metrics publisher and let publisher > send events to timeline server in batch via one request. When sets the batch > size to 1000, in out experiment the speed of the timeline server receives > events has 100x improvement. We have implement this function int our product > environment which accepts 2 app's in one hour and it works fine. > We add following configuration: > * yarn.resourcemanager.system-metrics-publisher.batch-size: the size of > system metrics publisher sending events in one request. Default value is 1000 > * yarn.resourcemanager.system-metrics-publisher.buffer-size: the size of the > event buffer in system metrics publisher. > * yarn.resourcemanager.system-metrics-publisher.interval-seconds: When > enable batch publishing, we must avoid that the publisher waits for a batch > to be filled up and hold events in buffer for long time. So we add another > thread which send event's in the buffer periodically. This config sets the > interval of the cyclical sending thread. The default value is 60s. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478460#comment-16478460 ] genericqa commented on YARN-4599: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 30m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 1s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 26m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 16s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 35s{color} | {color:orange} root: The patch generated 1 new + 235 unchanged - 1 fixed = 236 total (was 236) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 17m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 8m 36s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 9s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}196m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.util.TestBasicDiskValidator | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-4599 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attach
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478425#comment-16478425 ] genericqa commented on YARN-8292: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 9s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 13s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 9 new + 38 unchanged - 0 fixed = 47 total (was 38) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 38s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 32s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 16s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}156m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8292 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923819/YARN-8292.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 290f3268becf 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/persona
[jira] [Commented] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart
[ https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478394#comment-16478394 ] genericqa commented on YARN-8290: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 62m 32s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}120m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8290 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923820/YARN-8290.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f6bb0e1a6c56 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / be53969 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/20759/artifact/out/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20759/testReport/ | | Max. process+thread count | 799 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/Pre
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478387#comment-16478387 ] genericqa commented on YARN-8292: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 9 new + 42 unchanged - 0 fixed = 51 total (was 42) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 10s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 2s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}155m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8292 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923810/YARN-8292.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 49598110aae6 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Per
[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats
[ https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478333#comment-16478333 ] genericqa commented on YARN-8310: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 30s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common generated 1 new + 33 unchanged - 0 fixed = 34 total (was 33) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 21s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 5 new + 35 unchanged - 0 fixed = 40 total (was 35) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 56s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 11s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 54m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8310 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923816/YARN-8310.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4d6cc1cdb009 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / be53969 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/20758/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20758/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20758/testReport/ | | asflicense
[jira] [Commented] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478317#comment-16478317 ] genericqa commented on YARN-8080: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 35s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 15 new + 121 unchanged - 2 fixed = 136 total (was 123) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 9s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 40s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 94m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8080 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attac
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478311#comment-16478311 ] Eric Yang commented on YARN-8141: - [~csingh] Thank you for the patch, a few nits: FindAbsoluteMount method can be skipped. Container-executor looks up source mount location and comparing the absolute path to the white list mount (YARN-5534). DockerContainers.md still have reference to YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS. This can be removed as well. > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Labels: Docker > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch, YARN-8141.004.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart
[ https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-8290: --- Assignee: Eric Yang Affects Version/s: 3.1.1 [~leftnoteasy] According to your suggestion that ACL information is set too late and killing AM prior to ACL information is propagated can cause RM recovery to load partial application record. The suggested change is to move the ACL setup into ApplicationToSchedulerTransition. The patch moved the block of code accordingly. Let me know if this is the correct fix. Thanks > Yarn application failed to recover with "Error Launching job : User is not > set in the application report" error after RM restart > > > Key: YARN-8290 > URL: https://issues.apache.org/jira/browse/YARN-8290 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8290.001.patch > > > Scenario: > 1) Start 5 streaming application in background > 2) Kill Active RM and cause RM failover > After RM failover, The application failed with below error. > {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: > Invocation returned exception on [rm2] : > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1517520038847_0003' doesn't exist in RM. Please check > that the job submission was successful. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) > , so propagating back to caller. > 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application > application_1517520038847_0003 > 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/hrt_qa/.staging/job_1517520038847_0003 > 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is > not set in the application report > Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-4599: - Attachment: YARN-4599.012.patch > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, > YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, > YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, > YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, > YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, > YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart
[ https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8290: Attachment: YARN-8290.001.patch > Yarn application failed to recover with "Error Launching job : User is not > set in the application report" error after RM restart > > > Key: YARN-8290 > URL: https://issues.apache.org/jira/browse/YARN-8290 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Priority: Major > Attachments: YARN-8290.001.patch > > > Scenario: > 1) Start 5 streaming application in background > 2) Kill Active RM and cause RM failover > After RM failover, The application failed with below error. > {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: > Invocation returned exception on [rm2] : > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1517520038847_0003' doesn't exist in RM. Please check > that the job submission was successful. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) > , so propagating back to caller. > 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application > application_1517520038847_0003 > 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/hrt_qa/.staging/job_1517520038847_0003 > 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is > not set in the application report > Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478266#comment-16478266 ] Wangda Tan commented on YARN-8292: -- [~jlowe], I think you're correct :). I take my word back, my previous assumption: {code} Σ(selected-container.resource) <= (for all resource types) Σ(queue.to-be-obtain) selected-container queue {code} Can break one case which one starving queue need to preempt containers from two over-utilized queues. For example: {code} queue-A, guaranteed: <30,50> , used: <40, 60>. queue-B, guaranteed: <30,50>, used: <40, 60> {code} Assume we have a queue C want 20:20 resources. So in this case, both of queue-A/queue-B, resource to obtain = 10:10 If containers running on the system have same size = 20:30. Under my existing approach, nothing can be preempted. This is also why some UT failed. I just used your approach: bq. I think the check for a zero resource can be dropped and it simplifies to the toObtainAfterPreemption component-wise max'd with zero is less than the amount to obtain from the partition (after being max'd with zero). With the 0 resource type check I commented above: {code} // If a toObtain resource type == 0, set it to -1 to avoid 0 resource // type affect following doPreemption check: isAnyMajorResourceZero for (ResourceInformation ri : toObtainByPartition.getResources()) { if (ri.getValue() == 0) { ri.setValue(-1); } } {code} Now everything works. Please check the attached patch (ver.3) to see if it works. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8292: - Attachment: YARN-8292.003.patch > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats
[ https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478258#comment-16478258 ] Robert Kanter commented on YARN-8310: - The patch adds back code very similar to the original reading code for each of the 3 tokens. It will get called if a {{InvalidProtocolBufferException}} is thrown when trying to parse it as a protobuf. Also added tests. I had to add additional code to handle the case where the old {{ContainerTokenIdentifier}} format ends early because YARN-2581 added a {{LogAggregationContext}} field to the end, which might not exist. > Handle old NMTokenIdentifier, AMRMTokenIdentifier, and > ContainerTokenIdentifier formats > --- > > Key: YARN-8310 > URL: https://issues.apache.org/jira/browse/YARN-8310 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Major > Attachments: YARN-8310.001.patch, YARN-8310.branch-2.001.patch > > > In some recent upgrade testing, we saw this error causing the NodeManager to > fail to startup afterwards: > {noformat} > org.apache.hadoop.service.ServiceStateException: > com.google.protobuf.InvalidProtocolBufferException: Protocol message > contained an invalid tag (zero). > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895) > Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol > message contained an invalid tag (zero). > at > com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89) > at > com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108) > at > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860) > at > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824) > at > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016) > at > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011) > at > com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686) > at > org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177) > at > org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > ... 5 more > {noformat} > The NodeManager fails because it's trying to read a > {{ContainerTokenIdentifier}} in the "old" format before we changed them to > protobufs (YARN-668). This is very similar to YARN-5594 where we ran into a > similar problem with the ResourceManager and RM Delegation Tokens. > To provide a better experience, we should make the code able to read the old > format if it's unable to read it using the new format. We didn't run into > any errors with the other two types of tokens that YARN-668 incompatibly > changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may a
[jira] [Updated] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats
[ https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-8310: Attachment: YARN-8310.001.patch YARN-8310.branch-2.001.patch > Handle old NMTokenIdentifier, AMRMTokenIdentifier, and > ContainerTokenIdentifier formats > --- > > Key: YARN-8310 > URL: https://issues.apache.org/jira/browse/YARN-8310 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Major > Attachments: YARN-8310.001.patch, YARN-8310.branch-2.001.patch > > > In some recent upgrade testing, we saw this error causing the NodeManager to > fail to startup afterwards: > {noformat} > org.apache.hadoop.service.ServiceStateException: > com.google.protobuf.InvalidProtocolBufferException: Protocol message > contained an invalid tag (zero). > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895) > Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol > message contained an invalid tag (zero). > at > com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89) > at > com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108) > at > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860) > at > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824) > at > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016) > at > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011) > at > com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686) > at > org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177) > at > org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > ... 5 more > {noformat} > The NodeManager fails because it's trying to read a > {{ContainerTokenIdentifier}} in the "old" format before we changed them to > protobufs (YARN-668). This is very similar to YARN-5594 where we ran into a > similar problem with the ResourceManager and RM Delegation Tokens. > To provide a better experience, we should make the code able to read the old > format if it's unable to read it using the new format. We didn't run into > any errors with the other two types of tokens that YARN-668 incompatibly > changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix > those while we're at it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats
Robert Kanter created YARN-8310: --- Summary: Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats Key: YARN-8310 URL: https://issues.apache.org/jira/browse/YARN-8310 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter In some recent upgrade testing, we saw this error causing the NodeManager to fail to startup afterwards: {noformat} org.apache.hadoop.service.ServiceStateException: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero). at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895) Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero). at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89) at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108) at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860) at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824) at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016) at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686) at org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254) at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) ... 5 more {noformat} The NodeManager fails because it's trying to read a {{ContainerTokenIdentifier}} in the "old" format before we changed them to protobufs (YARN-668). This is very similar to YARN-5594 where we ran into a similar problem with the ResourceManager and RM Delegation Tokens. To provide a better experience, we should make the code able to read the old format if it's unable to read it using the new format. We didn't run into any errors with the other two types of tokens that YARN-668 incompatibly changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix those while we're at it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478246#comment-16478246 ] Wangda Tan commented on YARN-8292: -- Attached ver.2 patch, which fixed 0 value rare-resource problem mentioned by Jason. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch, YARN-8292.002.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8292: - Attachment: YARN-8292.002.patch > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch, YARN-8292.002.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478244#comment-16478244 ] Wangda Tan commented on YARN-8292: -- [~eepayne], It is actually on in {{setup()}}. :). [~jlowe], I can understand ur suggestion now, but simply drop the check as you mentioned: bq. I think the check for a zero resource can be dropped and it simplifies to the toObtainAfterPreemption component-wise max'd with zero is less than the amount to obtain from the partition (after being max'd with zero). Is not enough. The reason is, we want to make sure no over-preemption happens. For example. If res-to-obtain = (3, 0, 0), and container has size = (4, 1, 0) (The 3rd type is 0 for both). We don't want the preempt happen because it will make the queue under utilized. And it can preempt more containers than required. We need to make sure that: {code} Σ(selected-container.resource) <= (for all resource types) Σ(queue.to-be-obtain) selected-container queue {code} In my previous patch, as you mentioned if some resource type are always 0, it will invalidate the check. So I added a check: {code} // If a toObtain resource type == 0, set it to -1 to avoid 0 resource // type affect following doPreemption check: isAnyMajorResourceZero for (ResourceInformation ri : toObtainByPartition.getResources()) { if (ri.getValue() == 0) { ri.setValue(-1); } } {code} Before {code} if (Resources.greaterThan(rc, clusterResource, toObtainByPartition, Resources.none()) {code} It looks like the problem can be solved. Please let me know if you think different. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478235#comment-16478235 ] Suma Shivaprasad commented on YARN-8080: Thanks [~eyang] Have addressed review comments for terminating according to the restart policy and fixed most of the CS issues. > YARN native service should support component restart policy > --- > > Key: YARN-8080 > URL: https://issues.apache.org/jira/browse/YARN-8080 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8080.001.patch, YARN-8080.002.patch, > YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, > YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, > YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, > YARN-8080.014.patch, YARN-8080.015.patch, YARN-8080.016.patch > > > Existing native service assumes the service is long running and never > finishes. Containers will be restarted even if exit code == 0. > To support boarder use cases, we need to allow restart policy of component > specified by users. Propose to have following policies: > 1) Always: containers always restarted by framework regardless of container > exit status. This is existing/default behavior. > 2) Never: Do not restart containers in any cases after container finishes: To > support job-like workload (for example Tensorflow training job). If a task > exit with code == 0, we should not restart the task. This can be used by > services which is not restart/recovery-able. > 3) On-failure: Similar to above, only restart task with exitcode != 0. > Behaviors after component *instance* finalize (Succeeded or Failed when > restart_policy != ALWAYS): > 1) For single component, single instance: complete service. > 2) For single component, multiple instance: other running instances from the > same component won't be affected by the finalized component instance. Service > will be terminated once all instances finalized. > 3) For multiple components: Service will be terminated once all components > finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8309) Diagnostic message for yarn service app failure due token renewal should be improved
Yesha Vora created YARN-8309: Summary: Diagnostic message for yarn service app failure due token renewal should be improved Key: YARN-8309 URL: https://issues.apache.org/jira/browse/YARN-8309 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora When Yarn service application failed due to token renewal issue , The diagonstic message was unclear . {code:java} Application application_1526413043392_0002 failed 20 times due to AM Container for appattempt_1526413043392_0002_20 exited with exitCode: 1 Failing this attempt.Diagnostics: [2018-05-15 23:15:28.779]Exception from container-launch. Container id: container_e04_1526413043392_0002_20_01 Exit code: 1 Exception message: Launch container failed Shell output: main : command provided 1 main : run as user is hbase main : requested yarn user is hbase Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /grid/0/hadoop/yarn/local/nmPrivate/application_1526413043392_0002/container_e04_1526413043392_0002_20_01/container_e04_1526413043392_0002_20_01.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths... [2018-05-15 23:15:28.806]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : [2018-05-15 23:15:28.807]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : For more detailed output, check the application tracking page: https://xxx:8090/cluster/app/application_1526413043392_0002 Then click on links to logs of each attempt. . Failing the application.{code} Here, diagnostic message should be improved to specify that AM is failing due to token renewal issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8080: --- Attachment: YARN-8080.016.patch > YARN native service should support component restart policy > --- > > Key: YARN-8080 > URL: https://issues.apache.org/jira/browse/YARN-8080 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8080.001.patch, YARN-8080.002.patch, > YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, > YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, > YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, > YARN-8080.014.patch, YARN-8080.015.patch, YARN-8080.016.patch > > > Existing native service assumes the service is long running and never > finishes. Containers will be restarted even if exit code == 0. > To support boarder use cases, we need to allow restart policy of component > specified by users. Propose to have following policies: > 1) Always: containers always restarted by framework regardless of container > exit status. This is existing/default behavior. > 2) Never: Do not restart containers in any cases after container finishes: To > support job-like workload (for example Tensorflow training job). If a task > exit with code == 0, we should not restart the task. This can be used by > services which is not restart/recovery-able. > 3) On-failure: Similar to above, only restart task with exitcode != 0. > Behaviors after component *instance* finalize (Succeeded or Failed when > restart_policy != ALWAYS): > 1) For single component, single instance: complete service. > 2) For single component, multiple instance: other running instances from the > same component won't be affected by the finalized component instance. Service > will be terminated once all instances finalized. > 3) For multiple components: Service will be terminated once all components > finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8308) Yarn service app fails due to issues with Renew Token
Yesha Vora created YARN-8308: Summary: Yarn service app fails due to issues with Renew Token Key: YARN-8308 URL: https://issues.apache.org/jira/browse/YARN-8308 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Yesha Vora Run Yarn service application beyond dfs.namenode.delegation.token.max-lifetime. Here, yarn service application fails with below error. {code} 2018-05-15 23:14:35,652 [main] WARN ipc.Client - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ 2018-05-15 23:14:35,654 [main] INFO service.AbstractService - Service Service Master failed in state INITED org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) at org.apache.hadoop.ipc.Client.call(Client.java:1437) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581) at org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182) at org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337) at org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242) at org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) 2018-05-15 23:14:35,659 [main] INFO service.ServiceMaster - Stopping app master 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting service master org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) Caus
[jira] [Commented] (YARN-8103) Add CLI interface to query node attributes
[ https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478211#comment-16478211 ] genericqa commented on YARN-8103: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} YARN-3409 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 6m 53s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 30s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 33s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 39s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 7m 1s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 45s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 45s{color} | {color:green} YARN-3409 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 21s{color} | {color:red} hadoop-sls in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 25m 7s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 25m 7s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 25m 7s{color} | {color:red} root in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 46s{color} | {color:orange} root: The patch generated 14 new + 187 unchanged - 8 fixed = 201 total (was 195) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 27s{color} | {color:red} hadoop-sls in the patch failed. {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 23s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 14s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 2 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 47s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 24s{color} | {color:red} hadoop-sls in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 16s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:
[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded
[ https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478207#comment-16478207 ] Robert Kanter commented on YARN-8273: - Thanks for the patch. One question: - Why make {{LogAggregationDFSException}} an undeclared exception? Why not make it a subclass of {{YarnException}} and declare it? -- Also, if we do keep it as undeclared, it should be a subclass of {{YarnRuntimeException}} instead of {{RuntimeException}}. I was going to suggest we catch the {{DSQuotaExceededException}} when closing {{writer}}, but it turns out that {{TFile#close}} does _not_ close the underlying {{FSDataOutputStream}}. That's probably not what most people are expecting, but there's nothing we can do about that now. :/ > Log aggregation does not warn if HDFS quota in target directory is exceeded > --- > > Key: YARN-8273 > URL: https://issues.apache.org/jira/browse/YARN-8273 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.1.0 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: YARN-8273.000.patch, YARN-8273.001.patch, > YARN-8273.002.patch > > > It appears that if an HDFS space quota is set on a target directory for log > aggregation and the quota is already exceeded when log aggregation is > attempted, zero-byte log files will be written to the HDFS directory, however > NodeManager logs do not reflect a failure to write the files successfully > (i.e. there are no ERROR or WARN messages to this effect). > An improvement may be worth investigating to alert users to this scenario, as > otherwise logs for a YARN application may be missing both on HDFS and locally > (after local log cleanup is done) and the user may not otherwise be informed. > Steps to reproduce: > * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB) > * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full > * Run a Spark or MR job in the cluster > * Observe that zero byte files are written to HDFS after job completion > * Observe that YARN container logs are also not present on the NM hosts (or > are deleted after yarn.nodemanager.delete.debug-delay-sec) > * Observe that no ERROR or WARN messages appear to be logged in the NM role > log -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.
[ https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478184#comment-16478184 ] Eric Payne commented on YARN-4781: -- Hi [~sunilg]. Will you have an opportunity to review the latest patch? > Support intra-queue preemption for fairness ordering policy. > > > Key: YARN-4781 > URL: https://issues.apache.org/jira/browse/YARN-4781 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Eric Payne >Priority: Major > Attachments: YARN-4781.001.patch, YARN-4781.002.patch, > YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch > > > We introduced fairness queue policy since YARN-3319, which will let large > applications make progresses and not starve small applications. However, if a > large application takes the queue’s resources, and containers of the large > app has long lifespan, small applications could still wait for resources for > long time and SLAs cannot be guaranteed. > Instead of wait for application release resources on their own, we need to > preempt resources of queue with fairness policy enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478178#comment-16478178 ] genericqa commented on YARN-4599: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 11s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 31m 25s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 30m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 3s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 0s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 27m 0s{color} | {color:red} root generated 5 new + 6 unchanged - 0 fixed = 11 total (was 6) {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 27m 0s{color} | {color:red} root generated 188 new + 1277 unchanged - 0 fixed = 1465 total (was 1277) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 41s{color} | {color:green} root: The patch generated 0 new + 235 unchanged - 1 fixed = 235 total (was 236) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 17m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 8m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 52s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}150m 24s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}335m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.TestDFSInotifyEventInputStre
[jira] [Commented] (YARN-8071) Add ability to specify nodemanager environment variables individually
[ https://issues.apache.org/jira/browse/YARN-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478163#comment-16478163 ] Hudson commented on YARN-8071: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14213 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14213/]) YARN-8071. Add ability to specify nodemanager environment variables (jlowe: rev be539690477f7fee8f836bf3612cbe7ff6a3506e) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java > Add ability to specify nodemanager environment variables individually > - > > Key: YARN-8071 > URL: https://issues.apache.org/jira/browse/YARN-8071 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8071.001.patch, YARN-8071.002.patch, > YARN-8071.003.patch > > > YARN-6830 describes a problem where environment variables that contain commas > cannot be specified via {{-Dmapreduce.map.env}}. > For example: > {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}} > will set {{MOUNTS}} to {{/tmp/foo}} > In that Jira, [~aw] suggested that we change the API to provide a way to > specify environment variables individually, the same way that Spark does. > {quote}Rather than fight with a regex why not redefine the API instead? > > -Dmapreduce.map.env.MODE=bar > -Dmapreduce.map.env.IMAGE_NAME=foo > -Dmapreduce.map.env.MOUNTS=/tmp/foo,/tmp/bar > ... > e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar > This greatly simplifies the input validation needed and makes it clear what > is actually being defined. > {quote} > The mapreduce properties were dealt with in [MAPREDUCE-7069]. This Jira will > address the YARN properties. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478157#comment-16478157 ] Eric Payne commented on YARN-8292: -- bq. you can check org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePremption test for details. This test is not actually enabling DRF. You need to add the 5th argument to {{buildEnv()}}: {code} -buildEnv(labelsConfig, nodesConfig, queuesConfig, appsConfig); +buildEnv(labelsConfig, nodesConfig, queuesConfig, appsConfig, true); {code} > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8307) [atsv2 read acls] Coprocessor for reader authorization check
[ https://issues.apache.org/jira/browse/YARN-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-8307: - Issue Type: Sub-task (was: Bug) Parent: YARN-7055 > [atsv2 read acls] Coprocessor for reader authorization check > > > Key: YARN-8307 > URL: https://issues.apache.org/jira/browse/YARN-8307 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Vrushali C >Priority: Major > > Jira to track coprocessor creation for reader authorization check when > security is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8307) [atsv2 read acls] Coprocessor for reader authorization check
Vrushali C created YARN-8307: Summary: [atsv2 read acls] Coprocessor for reader authorization check Key: YARN-8307 URL: https://issues.apache.org/jira/browse/YARN-8307 Project: Hadoop YARN Issue Type: Bug Reporter: Vrushali C Assignee: Vrushali C Jira to track coprocessor creation for reader authorization check when security is enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8071) Add ability to specify nodemanager environment variables individually
[ https://issues.apache.org/jira/browse/YARN-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478114#comment-16478114 ] Jason Lowe commented on YARN-8071: -- bq. I assume this windows-only test must be failing without this fix. Ah, my bad. I missed the fact that this test was Windows-only, so that's why it was "passing" even without the change. +1 for the latest patch. Committing this. > Add ability to specify nodemanager environment variables individually > - > > Key: YARN-8071 > URL: https://issues.apache.org/jira/browse/YARN-8071 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-8071.001.patch, YARN-8071.002.patch, > YARN-8071.003.patch > > > YARN-6830 describes a problem where environment variables that contain commas > cannot be specified via {{-Dmapreduce.map.env}}. > For example: > {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}} > will set {{MOUNTS}} to {{/tmp/foo}} > In that Jira, [~aw] suggested that we change the API to provide a way to > specify environment variables individually, the same way that Spark does. > {quote}Rather than fight with a regex why not redefine the API instead? > > -Dmapreduce.map.env.MODE=bar > -Dmapreduce.map.env.IMAGE_NAME=foo > -Dmapreduce.map.env.MOUNTS=/tmp/foo,/tmp/bar > ... > e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar > This greatly simplifies the input validation needed and makes it clear what > is actually being defined. > {quote} > The mapreduce properties were dealt with in [MAPREDUCE-7069]. This Jira will > address the YARN properties. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478104#comment-16478104 ] Jason Lowe commented on YARN-8292: -- bq. After preemption, there're at least one 0 major resources (which indicates that the queue is still satisfied after preemption). I'm still confused by this point. How is that not going to be always true when the cluster has a rarely-used resource dimension? For example, let's say GPU is one of the dimensions, and all the apps that want to use GPUs are all running in only one of many queues on the cluster. The other queues will all have zero for their GPU usage, and any cross-queue preemptions between those other queues will all have zero in the GPU resource for toObtainFromPartition and toObtainAfterPreemption. In other words, it effectively disabled the less than Resources.none check when comparing preemptions between these non-GPU-using queues because GPU will always be zero so isAnyMajorResourceZero will always be true. Or am I missing something? For the case of not wanting to kill a container that is (4, 1, 1) when the ask is only (3, -1, -1), the comparison against Resources.none should cover that. What is an example scenario where the additional check if any resource dimension is zero is needed to do the right thing? From the scenario I described above, I can see where it can (incorrectly?) override the comparison against Resources.none and preempt a (4, 1, 0) container when the ask is only (3, -1, 0). > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8286) Add NMClient callback on container relaunch
[ https://issues.apache.org/jira/browse/YARN-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478085#comment-16478085 ] Jason Lowe commented on YARN-8286: -- This could be implemented as something in the AM/NM client connection, but it would require the AM to keep a long-lived connection to every NM that has containers running on it. I think a simpler approach for the AM is to get this information via the same channel it gets other container notifications like allocated, completed, etc. and that's the AM-RM heartbeat (i.e.: ApplicationMasterProtocol#allocate). Currently when a container completes on the NM side, the NM lets the RM know via an out-of-band heartbeat, and the RM in turn lets the AM know on the next AM hearbeat. I think it would be relatively straightforward to have the NM notify the RM of any container relaunches, just like it already does for container launches and completions. The RM can then relay this information to the AM. Then the AM wouldn't need to keep connected to every NM for relaunch status, and the container relaunch events would arrive to the AM just like container completion events do today without any new connections required. Thoughts? > Add NMClient callback on container relaunch > --- > > Key: YARN-8286 > URL: https://issues.apache.org/jira/browse/YARN-8286 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Priority: Critical > > The AM may need to perform actions when a container has been relaunched. For > example, the service AM would want to change the state it has recorded for > the container and retrieve new container status for the container, in case > the container IP has changed. (The NM would also need to remove the IP it has > stored for the container, so container status calls don't return an IP for a > container that is not currently running.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain
[ https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478053#comment-16478053 ] Hudson commented on YARN-7933: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14212 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14212/]) YARN-7933. [atsv2 read acls] Add TimelineWriter#writeDomain. (Rohith (haibochen: rev e3b7d7ac1694b8766ae11bc7e8ecf09763bb26db) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/TimelineWriter.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/collector/TestTimelineCollector.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestHBaseTimelineStorageDomain.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/domain/DomainTableRW.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineWriterImpl.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timelineservice/TimelineDomain.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorWebService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/FileSystemTimelineWriterImpl.java > [atsv2 read acls] Add TimelineWriter#writeDomain > - > > Key: YARN-7933 > URL: https://issues.apache.org/jira/browse/YARN-7933 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Rohith Sharma K S >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7933.01.patch, YARN-7933.02.patch, > YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch, YARN-7933.06.patch > > > > Add an API TimelineWriter#writeDomain for writing the domain info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8305) [UI2] No information available per container about the memory/vcores
[ https://issues.apache.org/jira/browse/YARN-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8305: - Reporter: Sumana Sathish (was: Gergely Novák) > [UI2] No information available per container about the memory/vcores > > > Key: YARN-8305 > URL: https://issues.apache.org/jira/browse/YARN-8305 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-ui-v2 >Reporter: Sumana Sathish >Assignee: Gergely Novák >Priority: Major > Attachments: YARN-8305.001.patch > > > In the Applications > App > Attempts > Attempt page, the Containers panel > shows information about container start time, status, etc., but not about > container memory/vcores used. > Discovered by [~ssath...@hortonworks.com]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8292: - Description: This is an example of the problem: {code} // guaranteed, max,used, pending "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c {code} There're 3 resource types. Total resource of the cluster is 30:18:6 For both of a/b, there're 3 containers running, each of container is 2:2:1. Queue c uses 0 resource, and have 1:1:1 pending resource. Under existing logic, preemption cannot happen. was: This is an example of the problem: (Same if we have more than 2 resources) Let's say we have 3 queues A/B/C. All containers with equal size <2,3> ||Queue||Guaranteed||Used ||Pending|| |A|<20, 10>|<20,30>| | |B|<20, 10>|0|0| |C|<20, 10>|0|<20, 30>| | | | | | Under current logic, A's calculated to-preempt (how much resource other queue can preempt) will be <0, 20>. The preemption will not happen. However, under the context of DRC, queue A is using more resource than guaranteed, so queue C will be starved > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8292: - Fix Version/s: (was: 3.1.1) (was: 3.2.0) > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8292: - Target Version/s: 3.2.0, 3.1.1 > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478026#comment-16478026 ] Wangda Tan edited comment on YARN-8292 at 5/16/18 8:16 PM: --- Thanks [~eepayne], Let me revise the example a bit, I was trying to simplify it to avoid confusing to people, I think the example is wrong. Here's one I was using in test: {code} // guaranteed, max,used, pending "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c {code} There're 3 resource types. Total resource of the cluster is 30:18:6 For both of a/b, there're 3 containers running, each of container is 2:2:1. Queue c uses 0 resource, and have 1:1:1 pending resource. Prior to the attached patch. The preemption cannot happen, you can check org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePreemption test for details. was (Author: leftnoteasy): Thanks [~eepayne], Let me revise the example a bit, I was trying to simplify it to avoid confusing to people, I think the example is wrong. Here's one I was using in test: {code} // guaranteed, max,used, pending "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c {code} There're 3 resource types. Total resource of the cluster is 30:18:6 For both of a/b, there're 3 containers running, each of container is 2:2:1 Prior to the attached patch. The preemption cannot happen, you can check org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePreemption test for details. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8292.001.patch > > > > This is an example of the problem: (Same if we have more than 2 resources) > > Let's say we have 3 queues A/B/C. All containers with equal size <2,3> > > ||Queue||Guaranteed||Used ||Pending|| > |A|<20, 10>|<20,30>| | > |B|<20, 10>|0|0| > |C|<20, 10>|0|<20, 30>| > | | | | | > > Under current logic, A's calculated to-preempt (how much resource other queue > can preempt) will be <0, 20>. The preemption will not happen. However, under > the context of DRC, queue A is using more resource than guaranteed, so queue > C will be starved -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478026#comment-16478026 ] Wangda Tan commented on YARN-8292: -- Thanks [~eepayne], Let me revise the example a bit, I was trying to simplify it to avoid confusing to people, I think the example is wrong. Here's one I was using in test: {code} // guaranteed, max,used, pending "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c {code} There're 3 resource types. Total resource of the cluster is 30:18:6 For both of a/b, there're 3 containers running, each of container is 2:2:1 Prior to the attached patch. The preemption cannot happen, you can check org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePreemption test for details. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8292.001.patch > > > > This is an example of the problem: (Same if we have more than 2 resources) > > Let's say we have 3 queues A/B/C. All containers with equal size <2,3> > > ||Queue||Guaranteed||Used ||Pending|| > |A|<20, 10>|<20,30>| | > |B|<20, 10>|0|0| > |C|<20, 10>|0|<20, 30>| > | | | | | > > Under current logic, A's calculated to-preempt (how much resource other queue > can preempt) will be <0, 20>. The preemption will not happen. However, under > the context of DRC, queue A is using more resource than guaranteed, so queue > C will be starved -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478004#comment-16478004 ] Miklos Szegedi commented on YARN-4599: -- [~snemeth], thanks for the review. oomHandlerTemp is necessary since we may throw an exception after we set it. An external code optimized by JVM (setting a variable before the constructor is finished) may get a partial copy of this object in case of the exception. It is better to set the fields after the exception is not thrown. 3) This code has to be very fast do the same conversion for 1000s of containers potentially, so I will keep the plain multiplication instead of calling int external code that deals with strings. > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, > YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, > YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, > YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, > YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.sandflee.patch, > yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477999#comment-16477999 ] Wangda Tan edited comment on YARN-8292 at 5/16/18 8:08 PM: --- [~jlowe], thanks for your review, bq. I think the check for a zero resource can be dropped and it simplifies to the toObtainAfterPreemption component-wise max'd with zero is less than the amount to obtain from the partition (after being max'd with zero). In other words, we want to preempt as long as we have some resources we want to obtain from the partition and preempting the container makes progress on at least one of the resource dimensions being requested from the partition. The second part is correct {{..at least one of the resource dimensions being requested from the partition}}, that's why we added following check: {code} 198 doPreempt = doPreempt && (Resources.lessThan(rc, clusterResource, 199 Resources 200 .componentwiseMax(toObtainAfterPreemption, Resources.none()), 201 Resources.componentwiseMax(toObtainByPartition, Resources.none(; {code} The check of {{toObtainAfterPreemption}} is to make sure we will not do over-preemption. For example, if a queue's res-to-obtain = (3,-1,-1), and a container is (4,1,1). Even if preempt the container can make positive contribution, we will not do this because after preemption, the queue becomes an under-utilized queue and it may preempt resources from other queues. Following logics are mostly to cover two cases to avoid over-preemption: {code} 195 doPreempt = Resources.greaterThanOrEqual(rc, clusterResource, 196 toObtainAfterPreemption, Resources.none()) || Resources 197 .isAnyMajorResourceZero(rc, toObtainAfterPreemption); {code} a. After preemption, there're some positive major resources. b. After preemption, there're at least one 0 major resources (which indicates that the queue is still satisfied after preemption). Please let me know if you still have any other questions. was (Author: leftnoteasy): [~jlowe], thanks for your review, bq. I think the check for a zero resource can be dropped and it simplifies to the toObtainAfterPreemption component-wise max'd with zero is less than the amount to obtain from the partition (after being max'd with zero). In other words, we want to preempt as long as we have some resources we want to obtain from the partition and preempting the container makes progress on at least one of the resource dimensions being requested from the partition. The second part is correct, that's why we added following check: {code} 198 doPreempt = doPreempt && (Resources.lessThan(rc, clusterResource, 199 Resources 200 .componentwiseMax(toObtainAfterPreemption, Resources.none()), 201 Resources.componentwiseMax(toObtainByPartition, Resources.none(; {code} The check of {{toObtainAfterPreemption}} is to make sure we will not do over-preemption. For example, if a queue's res-to-obtain = (3,-1,-1), and a container is (4,1,1). Even if preempt the container can make positive contribution, we will not do this because after preemption, the queue becomes an under-utilized queue and it may preempt resources from other queues. Following logics are mostly to cover two cases to avoid over-preemption: {code} 195 doPreempt = Resources.greaterThanOrEqual(rc, clusterResource, 196 toObtainAfterPreemption, Resources.none()) || Resources 197 .isAnyMajorResourceZero(rc, toObtainAfterPreemption); {code} a. After preemption, there're some positive major resources. b. After preemption, there're at least one 0 major resources (which indicates that the queue is still satisfied after preemption). Please let me know if you still have any other questions. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8292.001.patch > > > > This is an example of the problem: (Same if we have more than 2 resources) > > Let's say we have 3 queues A/B/C. All containers with equal size <2,3> > > ||Queue||Guaranteed||Used ||Pending|| > |A|<20, 10>|<20,30>| | > |B|<20, 10>|0|0| > |C|<20, 10>|0|<20, 30>| > | | | | | > > Under current logic, A's calculated to-preempt (how much resource other queue > can preempt) will be <
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477999#comment-16477999 ] Wangda Tan commented on YARN-8292: -- [~jlowe], thanks for your review, bq. I think the check for a zero resource can be dropped and it simplifies to the toObtainAfterPreemption component-wise max'd with zero is less than the amount to obtain from the partition (after being max'd with zero). In other words, we want to preempt as long as we have some resources we want to obtain from the partition and preempting the container makes progress on at least one of the resource dimensions being requested from the partition. The second part is correct, that's why we added following check: {code} 198 doPreempt = doPreempt && (Resources.lessThan(rc, clusterResource, 199 Resources 200 .componentwiseMax(toObtainAfterPreemption, Resources.none()), 201 Resources.componentwiseMax(toObtainByPartition, Resources.none(; {code} The check of {{toObtainAfterPreemption}} is to make sure we will not do over-preemption. For example, if a queue's res-to-obtain = (3,-1,-1), and a container is (4,1,1). Even if preempt the container can make positive contribution, we will not do this because after preemption, the queue becomes an under-utilized queue and it may preempt resources from other queues. Following logics are mostly to cover two cases to avoid over-preemption: {code} 195 doPreempt = Resources.greaterThanOrEqual(rc, clusterResource, 196 toObtainAfterPreemption, Resources.none()) || Resources 197 .isAnyMajorResourceZero(rc, toObtainAfterPreemption); {code} a. After preemption, there're some positive major resources. b. After preemption, there're at least one 0 major resources (which indicates that the queue is still satisfied after preemption). Please let me know if you still have any other questions. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8292.001.patch > > > > This is an example of the problem: (Same if we have more than 2 resources) > > Let's say we have 3 queues A/B/C. All containers with equal size <2,3> > > ||Queue||Guaranteed||Used ||Pending|| > |A|<20, 10>|<20,30>| | > |B|<20, 10>|0|0| > |C|<20, 10>|0|<20, 30>| > | | | | | > > Under current logic, A's calculated to-preempt (how much resource other queue > can preempt) will be <0, 20>. The preemption will not happen. However, under > the context of DRC, queue A is using more resource than guaranteed, so queue > C will be starved -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478000#comment-16478000 ] Eric Payne commented on YARN-8292: -- {quote} ||Queue||Guaranteed||Used ||Pending|| |A|<20, 10>|<20,30>| | |B|<20, 10>|0|0| |C|<20, 10>|0|<20, 30>| Under current logic, A's calculated to-preempt (how much resource other queue can preempt) will be <0, 20>. The preemption will not happen. {quote} I want to challenge the original example. The above does cause preemption. I have tested this scenario, and it does preempt. In my tests, the first resource is memory and the second is vcores. I think the reason is that the dominant resource calculator will determine that vcores is a higher percentage of the available resources than memory, so vcores is dominant. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8292.001.patch > > > > This is an example of the problem: (Same if we have more than 2 resources) > > Let's say we have 3 queues A/B/C. All containers with equal size <2,3> > > ||Queue||Guaranteed||Used ||Pending|| > |A|<20, 10>|<20,30>| | > |B|<20, 10>|0|0| > |C|<20, 10>|0|<20, 30>| > | | | | | > > Under current logic, A's calculated to-preempt (how much resource other queue > can preempt) will be <0, 20>. The preemption will not happen. However, under > the context of DRC, queue A is using more resource than guaranteed, so queue > C will be starved -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain
[ https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477991#comment-16477991 ] Haibo Chen commented on YARN-7933: -- Thanks [~rohithsharma] for the patch! I have checked it in trunk. > [atsv2 read acls] Add TimelineWriter#writeDomain > - > > Key: YARN-7933 > URL: https://issues.apache.org/jira/browse/YARN-7933 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Rohith Sharma K S >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7933.01.patch, YARN-7933.02.patch, > YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch, YARN-7933.06.patch > > > > Add an API TimelineWriter#writeDomain for writing the domain info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7278) LinuxContainer in docker mode will be failed when nodemanager restart, because timeout for docker is too slow.
[ https://issues.apache.org/jira/browse/YARN-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7278: -- Labels: Docker (was: ) > LinuxContainer in docker mode will be failed when nodemanager restart, > because timeout for docker is too slow. > -- > > Key: YARN-7278 > URL: https://issues.apache.org/jira/browse/YARN-7278 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0 > Environment: CentOS >Reporter: zhengchenyu >Priority: Major > Labels: Docker > Original Estimate: 1m > Remaining Estimate: 1m > > In our cluster, nodemanagere recovery is turn on, and we use LinuxConainer > with docker mode. > Container may be failed when nodemanager restart, exception is below: > {code} > [2017-09-29T15:47:14.433+08:00] [INFO] > containermanager.monitor.ContainersMonitorImpl.run(ContainersMonitorImpl.java > 472) [Container Monitor] : Memory usage of ProcessTree 120523 for > container-id container_1506600355508_0023_01_04: -1B of 10 GB physical > memory used; -1B of 31 GB virtual memory used > [2017-09-29T15:47:15.219+08:00] [ERROR] > containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java > 93) [ContainersLauncher #1] : Unable to recover container > container_1506600355508_0023_01_04 > java.io.IOException: Timeout while waiting for exit code from > container_1506600355508_0023_01_04 > [2017-09-29T15:47:15.220+08:00] [INFO] > containermanager.container.ContainerImpl.handle(ContainerImpl.java 1142) > [AsyncDispatcher event handler] : Container > container_1506600355508_0023_01_04 transitioned from RUNNING to > EXITED_WITH_FAILURE > [2017-09-29T15:47:15.221+08:00] [INFO] > containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java > 440) [AsyncDispatcher event handler] : Cleaning up container > container_1506600355508_0023_01_04 > {code} > I guess the proccess is done, but 2 seconde later( the variable is msecLeft), > the *.pid.exitcode wasn't created. Then I changed variable to 2ms, The > container is succeed when nodemanger is restart. > So I think it is too short for docker container to complete the work. > In docker mode of LinuxContainer, nm monitor the real task which is launched > by "docker run" command. Then "docker wait" command will wait for exitcode, > then "docker rm" will delete the docker container. Lastly, container-executor > will write the exit code. So if some docker command is slow enough, nm > wouldn't monitor the container. In fact, docker rm is always slow. > I think the exit code of docker rm dosen't matter with the real task, so I > think we could move the operation of write "*.pid.exitcode" before the > command of docker rm. Or monitor the docker wait proccess, but not the real > task. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7246) Fix the default docker binary path
[ https://issues.apache.org/jira/browse/YARN-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7246: -- Labels: Docker (was: ) > Fix the default docker binary path > -- > > Key: YARN-7246 > URL: https://issues.apache.org/jira/browse/YARN-7246 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Fix For: 2.8.2 > > Attachments: YARN-7246-branch-2.8.2.001.patch, > YARN-7246-branch-2.8.2.002.patch, YARN-7246-branch-2.8.2.003.patch, > YARN-7246-branch-2.8.2.004.patch, YARN-7246-branch-2.8.2.005.patch, > YARN-7246-branch-2.8.2.006.patch, YARN-7246-branch-2.8.2.007.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8141: -- Labels: Docker (was: ) > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Labels: Docker > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch, YARN-8141.004.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-4018: -- Labels: Docker (was: ) > correct docker image name is rejected by DockerContainerExecutor > > > Key: YARN-4018 > URL: https://issues.apache.org/jira/browse/YARN-4018 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Major > Labels: Docker > Attachments: YARN-4018.patch > > > For example: > "www.dockerbase.net/library/mongo" > "www.dockerbase.net:5000/library/mongo:latest" > leads to error: > Image: www.dockerbase.net/library/mongo is not a proper docker image > Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker > image -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8274: -- Labels: Docker (was: ) > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4016) docker container is still running when app is killed
[ https://issues.apache.org/jira/browse/YARN-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-4016: -- Labels: Docker (was: ) > docker container is still running when app is killed > > > Key: YARN-4016 > URL: https://issues.apache.org/jira/browse/YARN-4016 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Major > Labels: Docker > > The docker_container_executor_session.sh is generated like below: > {code} > ### get the pid of docker container by "docker inspect" > echo `/usr/bin/docker inspect --format {{.State.Pid}} > container_1438681002528_0001_01_02` > > .../container_1438681002528_0001_01_02.pid.tmp > ### rename *.pid.tmp to *.pid > /bin/mv -f .../container_1438681002528_0001_01_02.pid.tmp > .../container_1438681002528_0001_01_02.pid > ### launch the docker container > /usr/bin/docker run --rm --net=host --name > container_1438681002528_0001_01_02 -v ... library/mysql > /container_1438681002528_0001_01_02/launch_container.sh" > {code} > This is obviously wrong because you can not get the pid of a docker container > before starting it. When NodeManager try to kill the container, pid zero is > always read from the pid file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3201) add args for DistributedShell to specify a image for tasks that will run on docker
[ https://issues.apache.org/jira/browse/YARN-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-3201: -- Labels: Docker (was: ) > add args for DistributedShell to specify a image for tasks that will run on > docker > -- > > Key: YARN-3201 > URL: https://issues.apache.org/jira/browse/YARN-3201 > Project: Hadoop YARN > Issue Type: Wish > Components: applications/distributed-shell >Reporter: zhangwei >Assignee: yarntime >Priority: Major > Labels: Docker > > It's very useful to execute a script on docker to do some test, but the > distributedshell has no args to set the image. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name
[ https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-2976: -- Labels: Docker (was: ) > Invalid docs for specifying > yarn.nodemanager.docker-container-executor.exec-name > > > Key: YARN-2976 > URL: https://issues.apache.org/jira/browse/YARN-2976 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.0 >Reporter: Hitesh Shah >Assignee: Vijay Bhat >Priority: Minor > Labels: Docker > > Docs on > http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html > mention setting "docker -H=tcp://0.0.0.0:4243" for > yarn.nodemanager.docker-container-executor.exec-name. > However, the actual implementation does a fileExists for the specified value. > Either the docs need to be fixed or the impl changed to allow relative paths > or commands with additional args -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-3302: -- Labels: Docker (was: ) > TestDockerContainerExecutor should run automatically if it can detect docker > in the usual place > --- > > Key: YARN-3302 > URL: https://issues.apache.org/jira/browse/YARN-3302 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0 >Reporter: Ravi Prakash >Assignee: Ravindra Kumar Naik >Priority: Major > Labels: Docker > Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, > YARN-3302-trunk.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5426) Enable distributed shell to launch docker containers
[ https://issues.apache.org/jira/browse/YARN-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-5426: -- Labels: Docker (was: ) > Enable distributed shell to launch docker containers > > > Key: YARN-5426 > URL: https://issues.apache.org/jira/browse/YARN-5426 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Priority: Major > Labels: Docker > > This could be the easiest way to bring up docker containers on YARN. > An option like -docker , or with the ability to run different types of > images with locality requirement altogether. In short, I think making docker > container first-class thing in distributed shell is useful for testing > docker-based services on YARN in the long term. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6387) Provide a flag in Rest API GET response to notify if the app launch delay is due to docker image download.
[ https://issues.apache.org/jira/browse/YARN-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6387: -- Labels: Docker (was: ) > Provide a flag in Rest API GET response to notify if the app launch delay is > due to docker image download. > -- > > Key: YARN-6387 > URL: https://issues.apache.org/jira/browse/YARN-6387 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: sriharsha devineni >Priority: Major > Labels: Docker > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3289) Docker images should be downloaded during localization
[ https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-3289: -- Labels: Docker (was: ) > Docker images should be downloaded during localization > -- > > Key: YARN-3289 > URL: https://issues.apache.org/jira/browse/YARN-3289 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Ravi Prakash >Priority: Major > Labels: Docker > > We currently call docker run on images while launching containers. If the > image size if sufficiently big, the task will timeout. We should download the > image we want to run during localization (if possible) to prevent this -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6454) Add support for setting environment variables for docker containers
[ https://issues.apache.org/jira/browse/YARN-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6454: -- Labels: Docker (was: ) > Add support for setting environment variables for docker containers > --- > > Key: YARN-6454 > URL: https://issues.apache.org/jira/browse/YARN-6454 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Jaeboo Jeong >Priority: Major > Labels: Docker > Attachments: YARN-6454.001.patch > > > Docker allows to set environment variables for your containers with the -e > flag. > You can set environment variables like below. > {code} > YARN_CONTAINER_RUNTIME_DOCKER_ENVIRONMENT_VARIABLES=“HADOOP_CONF_DIR=$HADOOP_CONF_DIR,HADOOP_HDFS_HOME=/opt/hadoop" > {code} > If you want to set a list of values, apply YARN-6434 first. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7996) Allow user supplied Docker client configurations with YARN native services
[ https://issues.apache.org/jira/browse/YARN-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7996: -- Labels: Docker (was: ) > Allow user supplied Docker client configurations with YARN native services > -- > > Key: YARN-7996 > URL: https://issues.apache.org/jira/browse/YARN-7996 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-7996.001.patch, YARN-7996.002.patch, > YARN-7996.003.patch, YARN-7996.004.patch, YARN-7996.005.patch, > YARN-7996.006.patch > > > YARN-5428 added support to distributed shell for supplying a Docker client > configuration at application submission time. The auth tokens within the > client configuration are then used to pull images from private Docker > repositories/registries. Add the same support to the YARN Native Services > framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-1964: -- Labels: Docker (was: ) > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Abin Shahab >Priority: Major > Labels: Docker > Fix For: 2.6.0 > > Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch > > > *This alpha feature has been deprecated in branch-2 and removed from trunk* > Please see https://issues.apache.org/jira/browse/YARN-5388 > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5505) Create an agent-less docker provider in the native-services framework
[ https://issues.apache.org/jira/browse/YARN-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-5505: -- Labels: Docker (was: ) > Create an agent-less docker provider in the native-services framework > - > > Key: YARN-5505 > URL: https://issues.apache.org/jira/browse/YARN-5505 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Labels: Docker > Fix For: yarn-native-services > > Attachments: YARN-5505-yarn-native-services.001.patch, > YARN-5505-yarn-native-services.002.patch > > > The Slider AM has a pluggable portion called a provider. Currently the only > provider implementation is the agent provider which contains the bulk of the > agent-related Java code. We can implement a docker provider that does not use > the agent and gets information it needs directly from the NM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3988) DockerContainerExecutor should allow user specify "docker run" parameters
[ https://issues.apache.org/jira/browse/YARN-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-3988: -- Labels: Docker (was: ) > DockerContainerExecutor should allow user specify "docker run" parameters > - > > Key: YARN-3988 > URL: https://issues.apache.org/jira/browse/YARN-3988 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chen He >Assignee: Chen He >Priority: Major > Labels: Docker > > In current DockerContainerExecutor, the "docker run" command has fixed > parameters: > String commandStr = commands.append(dockerExecutor) > .append(" ") > .append("run") > .append(" ") > .append("--rm --net=host") > .append(" ") > .append(" --name " + containerIdStr) > .append(localDirMount) > .append(logDirMount) > .append(containerWorkDirMount) > .append(" ") > .append(containerImageName) > .toString(); > For example, it is not flexible if users want to start a docker container > with attaching extra volume(s) and other "docker run" parameters. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8181) Docker container run_time
[ https://issues.apache.org/jira/browse/YARN-8181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8181: -- Labels: Docker (was: ) > Docker container run_time > - > > Key: YARN-8181 > URL: https://issues.apache.org/jira/browse/YARN-8181 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Seyyed Ahmad Javadi >Priority: Major > Labels: Docker > > Hi All, > I want to use docker container run time but could not solve the facing > problem. I am following the guide below and the NM log is as follows. I can > not see any docker containers to be created. It works when I use default LCE. > Please also find how I submit a job at the end as well. > Do you have any guide on how can I make Docker rum_time works? > May you please let me know how can use LCE binary to make sure my docker > setup is correct? > I confirmed that "docker run" works fine. I really like this developing > feature and would like to contribute to it. Many thanks in advance. > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/DockerContainers.html] > {code:java} > NM LOG: > ... > 2018-04-19 11:49:24,568 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for appattempt_1524151293356_0005_01 (auth:SIMPLE) > 2018-04-19 11:49:24,580 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Start request for container_1524151293356_0005_01_01 by user ubuntu > 2018-04-19 11:49:24,584 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Creating a new application reference for app application_1524151293356_0005 > 2018-04-19 11:49:24,584 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=ubuntu > IP=130.245.127.176 OPERATION=Start Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1524151293356_0005 > CONTAINERID=container_1524151293356_0005_01_01 > 2018-04-19 11:49:24,585 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1524151293356_0005 transitioned from NEW to INITING > 2018-04-19 11:49:24,585 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Adding container_1524151293356_0005_01_01 to application > application_1524151293356_0005 > 2018-04-19 11:49:24,585 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1524151293356_0005 transitioned from INITING to > RUNNING > 2018-04-19 11:49:24,588 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1524151293356_0005_01_01 transitioned from NEW to > LOCALIZING > 2018-04-19 11:49:24,588 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got > event CONTAINER_INIT for appId application_1524151293356_0005 > 2018-04-19 11:49:24,589 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1524151293356_0005_01_01 > 2018-04-19 11:49:24,616 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Writing credentials to the nmPrivate file > /tmp/hadoop-ubuntu/nm-local-dir/nmPrivate/container_1524151293356_0005_01_01.tokens > 2018-04-19 11:49:28,090 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1524151293356_0005_01_01 transitioned from > LOCALIZING to SCHEDULED > 2018-04-19 11:49:28,090 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Starting container [container_1524151293356_0005_01_01] > 2018-04-19 11:49:28,212 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1524151293356_0005_01_01 transitioned from SCHEDULED > to RUNNING > 2018-04-19 11:49:28,212 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Starting resource-monitoring for container_1524151293356_0005_01_01 > 2018-04-19 11:49:29,401 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Container container_1524151293356_0005_01_01 succeeded > 2018-04-19 11:49:29,401 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1524151293356_0005_01_01 transitioned from RUNNING > to EXITED_WITH_SUCCESS > 2018-04-19 11:49:29,401 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Cleaning up container container_1524151293356_0005_01_000
[jira] [Updated] (YARN-6160) Create an agent-less docker-less provider in the native services framework
[ https://issues.apache.org/jira/browse/YARN-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6160: -- Labels: Docker (was: ) > Create an agent-less docker-less provider in the native services framework > -- > > Key: YARN-6160 > URL: https://issues.apache.org/jira/browse/YARN-6160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Labels: Docker > Fix For: yarn-native-services > > Attachments: YARN-6160-yarn-native-services.001.patch, > YARN-6160-yarn-native-services.002.patch, > YARN-6160-yarn-native-services.003.patch > > > The goal of the agent-less docker-less provider is to be able to use the YARN > native services framework when Docker is not installed or other methods of > app resource installation are preferable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8231) Dshell application fails when one of the docker container gets killed
[ https://issues.apache.org/jira/browse/YARN-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8231: -- Labels: Docker (was: ) > Dshell application fails when one of the docker container gets killed > - > > Key: YARN-8231 > URL: https://issues.apache.org/jira/browse/YARN-8231 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Priority: Critical > Labels: Docker > > 1) Launch dshell application > {code} > yarn jar hadoop-yarn-applications-distributedshell-*.jar -shell_command > "sleep 300" -num_containers 2 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker > -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest > -keep_containers_across_application_attempts -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > 2) Kill container_1524681858728_0012_01_02 > Expected behavior: > Application should start new instance and finish successfully > Actual behavior: > Application Failed as soon as container was killed > {code:title=AM log} > 18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, completedCnt=1 > 18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: > appattempt_1524681858728_0012_01 got container status for > containerID=container_1524681858728_0012_01_02, state=COMPLETE, > exitStatus=137, diagnostics=[2018-04-27 23:05:09.310]Container killed on > request. Exit code is 137 > [2018-04-27 23:05:09.331]Container exited with a non-zero exit code 137. > [2018-04-27 23:05:09.332]Killed by external signal > 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, completedCnt=1 > 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: > appattempt_1524681858728_0012_01 got container status for > containerID=container_1524681858728_0012_01_03, state=COMPLETE, > exitStatus=0, diagnostics= > 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Container > completed successfully., containerId=container_1524681858728_0012_01_03 > 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application > completed. Stopping running containers > 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application > completed. Signalling finish to RM > 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Diagnostics., > total=2, completed=2, allocated=2, failed=1 > 18/04/27 23:08:46 INFO impl.AMRMClientImpl: Waiting for application to be > successfully unregistered.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7416) Use "docker volume inspect" to make sure that volumes for GPU drivers/libs are properly mounted.
[ https://issues.apache.org/jira/browse/YARN-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7416: -- Labels: Docker (was: ) > Use "docker volume inspect" to make sure that volumes for GPU drivers/libs > are properly mounted. > - > > Key: YARN-7416 > URL: https://issues.apache.org/jira/browse/YARN-7416 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Labels: Docker > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6804) Allow custom hostname for docker containers in native services
[ https://issues.apache.org/jira/browse/YARN-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6804: -- Labels: Docker (was: ) > Allow custom hostname for docker containers in native services > -- > > Key: YARN-6804 > URL: https://issues.apache.org/jira/browse/YARN-6804 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Labels: Docker > Fix For: 2.9.0, yarn-native-services, 3.0.0-beta1 > > Attachments: YARN-6804-branch-2.8.01.patch, > YARN-6804-trunk.004.patch, YARN-6804-trunk.005.patch, > YARN-6804-yarn-native-services.001.patch, > YARN-6804-yarn-native-services.002.patch, > YARN-6804-yarn-native-services.003.patch, > YARN-6804-yarn-native-services.004.patch, > YARN-6804-yarn-native-services.005.patch > > > Instead of the default random docker container hostname, we could set a more > user-friendly hostname for the container. The default could be a hostname > based on the container ID, with an option for the AM to provide a different > hostname. In the case of the native services AM, we could provide the > hostname that would be created by the registry DNS server. Regardless of > whether or not registry DNS is enabled, this would be a more useful hostname > for the docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2466) Umbrella issue for Yarn launched Docker Containers
[ https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-2466: -- Labels: Docker (was: ) > Umbrella issue for Yarn launched Docker Containers > -- > > Key: YARN-2466 > URL: https://issues.apache.org/jira/browse/YARN-2466 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.4.1 >Reporter: Abin Shahab >Priority: Major > Labels: Docker > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to package their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). > In addition to software isolation mentioned above, Docker containers will > provide resource, network, and user-namespace isolation. > Docker provides resource isolation through cgroups, similar to > LinuxContainerExecutor. This prevents one job from taking other jobs > resource(memory and CPU) on the same hadoop cluster. > User-namespace isolation will ensure that the root on the container is mapped > an unprivileged user on the host. This is currently being added to Docker. > Network isolation will ensure that one user’s network traffic is completely > isolated from another user’s network traffic. > Last but not the least, the interaction of Docker and Kerberos will have to > be worked out. These Docker containers must work in a secure hadoop > environment. > Additional details are here: > https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8208) Add log statement for Docker client configuration file at INFO level
[ https://issues.apache.org/jira/browse/YARN-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8208: -- Labels: Docker (was: ) > Add log statement for Docker client configuration file at INFO level > > > Key: YARN-8208 > URL: https://issues.apache.org/jira/browse/YARN-8208 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Minor > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8208.001.patch, YARN-8208.002.patch, > YARN-8208.003.patch > > > log statement to indicate source of Docker client configuration file should > be moved to INFO level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6622) Document Docker work as experimental
[ https://issues.apache.org/jira/browse/YARN-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6622: -- Labels: Docker (was: ) > Document Docker work as experimental > > > Key: YARN-6622 > URL: https://issues.apache.org/jira/browse/YARN-6622 > Project: Hadoop YARN > Issue Type: Task > Components: documentation >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Blocker > Labels: Docker > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: YARN-6622.001.patch > > > We should update the Docker support documentation calling out the Docker work > as experimental. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-3324: -- Labels: BB2015-05-TBR Docker (was: BB2015-05-TBR) > TestDockerContainerExecutor should clean test docker image from local > repository after test is done > --- > > Key: YARN-3324 > URL: https://issues.apache.org/jira/browse/YARN-3324 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0 >Reporter: Chen He >Priority: Major > Labels: BB2015-05-TBR, Docker > Attachments: YARN-3324-branch-2.6.0.002.patch, > YARN-3324-trunk.002.patch > > > Current TestDockerContainerExecutor only cleans the temp directory in local > file system but leaves the test docker image in local docker repository. It > should be cleaned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer
[ https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6091: -- Labels: Docker (was: ) > the AppMaster register failed when use Docker on LinuxContainer > > > Key: YARN-6091 > URL: https://issues.apache.org/jira/browse/YARN-6091 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.8.1 > Environment: CentOS >Reporter: zhengchenyu >Assignee: Eric Badger >Priority: Critical > Labels: Docker > Attachments: YARN-6091.001.patch, YARN-6091.002.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > In some servers, When I use Docker on LinuxContainer, I found the aciton that > AppMaster register to Resourcemanager failed. But didn't happen in other > servers. > I found the pclose (in container-executor.c) return different value in > different server, even though the process which is launched by popen is > running normally. Some server return 0, and others return 13. > Because yarn regard the application as failed application when pclose return > nonzero, and yarn will remove the AMRMToken, then the AppMaster register > failed because Resourcemanager have removed this applicaiton's token. > In container-executor.c, the judgement condition is whether the return code > is zero. But man the pclose, the document tells that "pclose return -1" > represent wrong. So I change the judgement condition, then slove this > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7361) Improve the docker container runtime documentation
[ https://issues.apache.org/jira/browse/YARN-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7361: -- Labels: Docker (was: ) > Improve the docker container runtime documentation > -- > > Key: YARN-7361 > URL: https://issues.apache.org/jira/browse/YARN-7361 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Fix For: 2.8.3, 3.1.0, 2.9.1, 3.0.1 > > Attachments: YARN-7361.001.patch, YARN-7361.002.patch > > > During review of YARN-7230, it was found that > yarn.nodemanager.runtime.linux.docker.capabilities is missing from the docker > containers documentation in most of the active branches. We can also improve > the warning that was introduced in YARN-6622. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8285) Remove unused environment variables from the Docker runtime
[ https://issues.apache.org/jira/browse/YARN-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8285: -- Labels: Docker (was: ) > Remove unused environment variables from the Docker runtime > --- > > Key: YARN-8285 > URL: https://issues.apache.org/jira/browse/YARN-8285 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Eric Badger >Priority: Trivial > Labels: Docker > Attachments: YARN-8285.001.patch > > > YARN-7430 enabled user remapping for Docker containers by default. As a > result, YARN_CONTAINER_RUNTIME_DOCKER_RUN_ENABLE_USER_REMAPPING is no longer > used and can be removed. > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE_FILE was added in the original > implementation, but was never used and can be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7224) Support GPU isolation for docker container
[ https://issues.apache.org/jira/browse/YARN-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7224: -- Labels: Docker (was: ) > Support GPU isolation for docker container > -- > > Key: YARN-7224 > URL: https://issues.apache.org/jira/browse/YARN-7224 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Labels: Docker > Fix For: 3.1.0 > > Attachments: YARN-7224.001.patch, YARN-7224.002-wip.patch, > YARN-7224.003.patch, YARN-7224.004.patch, YARN-7224.005.patch, > YARN-7224.006.patch, YARN-7224.007.patch, YARN-7224.008.patch, > YARN-7224.009.patch > > > This patch is to address issues when docker container is being used: > 1. GPU driver and nvidia libraries: If GPU drivers and NV libraries are > pre-packaged inside docker image, it could conflict to driver and > nvidia-libraries installed on Host OS. An alternative solution is to detect > Host OS's installed drivers and devices, mount it when launch docker > container. Please refer to \[1\] for more details. > 2. Image detection: > From \[2\], the challenge is: > bq. Mounting user-level driver libraries and device files clobbers the > environment of the container, it should be done only when the container is > running a GPU application. The challenge here is to determine if a given > image will be using the GPU or not. We should also prevent launching > containers based on a Docker image that is incompatible with the host NVIDIA > driver version, you can find more details on this wiki page. > 3. GPU isolation. > *Proposed solution*: > a. Use nvidia-docker-plugin \[3\] to address issue #1, this is the same > solution used by K8S \[4\]. issue #2 could be addressed in a separate JIRA. > We won't ship nvidia-docker-plugin with out releases and we require cluster > admin to preinstall nvidia-docker-plugin to use GPU+docker support on YARN. > "nvidia-docker" is a wrapper of docker binary which can address #3 as well, > however "nvidia-docker" doesn't provide same semantics of docker, and it > needs to setup additional environments such as PATH/LD_LIBRARY_PATH to use > it. To avoid introducing additional issues, we plan to use > nvidia-docker-plugin + docker binary approach. > b. To address GPU driver and nvidia libraries, we uses nvidia-docker-plugin > \[3\] to create a volume which includes GPU-related libraries and mount it > when docker container being launched. Changes include: > - Instead of using {{volume-driver}}, this patch added {{docker volume > create}} command to c-e and NM Java side. The reason is {{volume-driver}} can > only use single volume driver for each launched docker container. > - Updated {{c-e}} and Java side, if a mounted volume is a named volume in > docker, skip checking file existence. (Named-volume still need to be added to > permitted list of container-executor.cfg). > c. To address isolation issue: > We found that, cgroup + docker doesn't work under newer docker version which > uses {{runc}} as default runtime. Setting {{--cgroup-parent}} to a cgroup > which include any {{devices.deny}} causes docker container cannot be launched. > Instead this patch passes allowed GPU devices via {{--device}} to docker > launch command. > References: > \[1\] https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver > \[2\] https://github.com/NVIDIA/nvidia-docker/wiki/Image-inspection > \[3\] https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin > \[4\] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7811) Service AM should use configured default docker network
[ https://issues.apache.org/jira/browse/YARN-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7811: -- Labels: Docker (was: ) > Service AM should use configured default docker network > --- > > Key: YARN-7811 > URL: https://issues.apache.org/jira/browse/YARN-7811 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Labels: Docker > Fix For: 3.1.0 > > Attachments: YARN-7811.01.patch > > > Currently the DockerProviderService used by the Service AM hardcodes a > default of bridge for the docker network. We already have a YARN > configuration property for default network, so the Service AM should honor > that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5879) Correctly handle docker.image and launch command when unique_component_support is specified
[ https://issues.apache.org/jira/browse/YARN-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-5879: -- Labels: Docker (was: ) > Correctly handle docker.image and launch command when > unique_component_support is specified > --- > > Key: YARN-5879 > URL: https://issues.apache.org/jira/browse/YARN-5879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Wangda Tan >Priority: Major > Labels: Docker > Attachments: YARN-5879-yarn-native-services.poc.1.patch > > > Found two issues, when {{unique_component_support}} specified to true, server > will return error message like: > {code} > { > "diagnostics": "Property docker.image not specified for > {component-name}-{component-id}" > } > {code} > And in addition, the launch command cannot handle patterns like > {{COMPONENT_ID}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8287) Update documentation and yarn-default related to the Docker runtime
[ https://issues.apache.org/jira/browse/YARN-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8287: -- Labels: Docker (was: ) > Update documentation and yarn-default related to the Docker runtime > --- > > Key: YARN-8287 > URL: https://issues.apache.org/jira/browse/YARN-8287 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Priority: Minor > Labels: Docker > > There are a few typos and omissions in the documentation and yarn-default wrt > running Docker containers on YARN. Below is what I noticed, but a more > thorough review is still needed: > * docker.allowed.volume-drivers is not documented > * None of the GPU or FPGA related items are in the Docker docs. > * "To run without any capabilites," - typo in yarn-default.xml > * remove from yarn-default.xml > * yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed missing from > docs > * yarn.nodemanager.runtime.linux.docker.stop.grace-period missing from docs > * The user remapping features are missing from the docs, we should > explicitly call this out. > * The privileged container section could use a bit of rework to outline the > risks of the feature. > * Is it time to remove the security warnings? The community has made many > improvements since that warning was added. > * "path within the contatiner" typo -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8160: -- Labels: Docker (was: ) > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: Docker > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8284) get_docker_command refactoring
[ https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8284: -- Labels: Docker (was: ) > get_docker_command refactoring > -- > > Key: YARN-8284 > URL: https://issues.apache.org/jira/browse/YARN-8284 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.2.0, 3.1.1 >Reporter: Jason Lowe >Assignee: Eric Badger >Priority: Minor > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8284.001.patch, YARN-8284.002.patch > > > YARN-8274 occurred because get_docker_command's helper functions each have to > remember to put the docker binary as the first argument. This is error prone > and causes code duplication for each of the helper functions. It would be > safer and simpler if get_docker_command initialized the docker binary > argument in one place and each of the helper functions only added the > arguments specific to their particular docker sub-command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7878) Docker container IP detail missing when service is in STABLE state
[ https://issues.apache.org/jira/browse/YARN-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7878: -- Labels: Docker (was: ) > Docker container IP detail missing when service is in STABLE state > -- > > Key: YARN-7878 > URL: https://issues.apache.org/jira/browse/YARN-7878 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Priority: Critical > Labels: Docker > > Scenario > 1) Launch Hbase on docker app > 2) Validate yarn service status using cli > {code:java} > {"name":"hbase-app-with-docker","id":"application_1517516543573_0012","artifact":{"id":"hbase-centos","type":"DOCKER"},"lifetime":3519,"components":[{"name":"hbasemaster","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_MASTER_OPTS":"-Xmx2048m > > -Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_02","ip":"10.0.0.9","hostname":"hbasemaster-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029963,"bare_host":"xxx","component_name":"hbasemaster-0"}],"launch_command":"sleep > 15; /usr/hdp/current/hbase-master/bin/hbase master > start","number_of_containers":1,"run_privileged_container":false},{"name":"regionserver","dependencies":["hbasemaster"],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_REGIONSERVER_OPTS":"-XX:CMSInitiatingOccupancyFraction=70 > -Xmx2048m > -Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.regionserver.hostname":"${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_05","state":"READY","launch_time":1517533059022,"bare_host":"xxx","component_name":"regionserver-0"}],"launch_command":"sleep > 15; /usr/hdp/current/hbase-regionserver/bin/hbase regionserver > start","number_of_containers":1,"run_privileged_container":false},{"name":"hbaseclient","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"1024"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_03","ip":"10.0.0.8","hostname":"hbaseclient-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029964,"bare_host":"xxx","component_name":"hbaseclient-0"}],"launch_command":"sleep > > infinity","number_of_containers":1,"run_privileged_container":false}],"configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_LOG_DIR":""},"files":[{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hd
[jira] [Updated] (YARN-7805) Yarn should update container as failed on docker container failure
[ https://issues.apache.org/jira/browse/YARN-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7805: -- Labels: Docker (was: ) > Yarn should update container as failed on docker container failure > -- > > Key: YARN-7805 > URL: https://issues.apache.org/jira/browse/YARN-7805 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Priority: Major > Labels: Docker > > Steps: > Start hbase yarn service example on docker > when Hbase master fails, it lead the master daemon docker to fail. > {code} > [root@xx bin]# docker ps -a > CONTAINER IDIMAGE > COMMAND CREATED STATUS > PORTS NAMES > a57303b1a736x/xxxhbase:x.x.x.x.0.0.0 "bash /grid/0/hadoop/" 5 > minutes ago Exited (1) 4 minutes ago > container_e07_1516734339938_0018_01_02 > [root@xxx bin]# docker exec -it a57303b1a736 bash > Error response from daemon: Container > a57303b1a7364a733428ec76581368253e5a701560a510204b8c302e3bbeed26 is not > running > {code} > Expected behavior: > Yarn should mark this container as failed and start new docker container > Actual behavior: > Yarn did not capture that container is failed. It kept showing container > status as Running. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7412) test_docker_util.test_check_mount_permitted() is failing
[ https://issues.apache.org/jira/browse/YARN-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7412: -- Labels: Docker (was: ) > test_docker_util.test_check_mount_permitted() is failing > > > Key: YARN-7412 > URL: https://issues.apache.org/jira/browse/YARN-7412 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha4 >Reporter: Haibo Chen >Assignee: Eric Badger >Priority: Critical > Labels: Docker > Fix For: 3.0.0 > > Attachments: YARN-7412.001.patch > > > Test output > classname="TestDockerUtil"> >message="/home/haibochen/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc:444 > Expected: itr->second Which is: 1 To be equal to: > ret Which is: 0 for inp > ut /usr/bin/touch" > type=""> > > classname="TestDockerUtil"> >message="/home/haibochen/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc:462 > Expected: expected[i] Which is: > "/usr/bin/touch" To be equal to: ptr[i] > Which is: "/bin/touch"" > type=""> > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7717) Add configuration consistency for module.enabled and docker.privileged-containers.enabled
[ https://issues.apache.org/jira/browse/YARN-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7717: -- Labels: Docker (was: ) > Add configuration consistency for module.enabled and > docker.privileged-containers.enabled > - > > Key: YARN-7717 > URL: https://issues.apache.org/jira/browse/YARN-7717 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Yesha Vora >Assignee: Eric Badger >Priority: Major > Labels: Docker > Fix For: 3.1.0 > > Attachments: YARN-7717.001.patch, YARN-7717.002.patch, > YARN-7717.003.patch, YARN-7717.004.patch > > > container-executor.cfg has two properties related to dockerization. > 1) module.enabled = true/false > 2) docker.privileged-containers.enabled = 1/0 > Here, both property takes different value to enable / disable feature. Module > enabled take true/false string while docker.privileged-containers.enabled > takes 1/0 integer value. > This properties behavior should be consistent. Both properties should have > true or false string as value to enable or disable feature/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-2981: -- Labels: Docker oct16-easy (was: oct16-easy) > DockerContainerExecutor must support a Cluster-wide default Docker image > > > Key: YARN-2981 > URL: https://issues.apache.org/jira/browse/YARN-2981 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abin Shahab >Assignee: Abin Shahab >Priority: Major > Labels: Docker, oct16-easy > Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch, > YARN-2981.patch > > > This allows the yarn administrator to add a cluster-wide default docker image > that will be used when there are no per-job override of docker images. With > this features, it would be convenient for newer applications like slider to > launch inside a cluster-default docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7677) Docker image cannot set HADOOP_CONF_DIR
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7677: -- Labels: Docker (was: ) > Docker image cannot set HADOOP_CONF_DIR > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Eric Badger >Assignee: Jim Brennan >Priority: Major > Labels: Docker > Fix For: 3.1.0 > > Attachments: YARN-7677.001.patch, YARN-7677.002.patch, > YARN-7677.003.patch, YARN-7677.004.patch, YARN-7677.005.patch, > YARN-7677.006.patch, YARN-7677.007.patch > > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5689) Update native services REST API to use agentless docker provider
[ https://issues.apache.org/jira/browse/YARN-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-5689: -- Labels: Docker (was: ) > Update native services REST API to use agentless docker provider > > > Key: YARN-5689 > URL: https://issues.apache.org/jira/browse/YARN-5689 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Major > Labels: Docker > Fix For: yarn-native-services > > Attachments: YARN-5689-yarn-native-services.001.patch, > YARN-5689-yarn-native-services.002.patch > > > The initial version of the native services REST API uses the agent provider. > It should be converted to use the new docker provider instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8265: -- Labels: Docker (was: ) > Service AM should retrieve new IP for docker container relaunched by NM > --- > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8265.001.patch, YARN-8265.002.patch, > YARN-8265.003.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8284) get_docker_command refactoring
[ https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8284: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-3611 > get_docker_command refactoring > -- > > Key: YARN-8284 > URL: https://issues.apache.org/jira/browse/YARN-8284 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.2.0, 3.1.1 >Reporter: Jason Lowe >Assignee: Eric Badger >Priority: Minor > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8284.001.patch, YARN-8284.002.patch > > > YARN-8274 occurred because get_docker_command's helper functions each have to > remember to put the docker binary as the first argument. This is error prone > and causes code duplication for each of the helper functions. It would be > safer and simpler if get_docker_command initialized the docker binary > argument in one place and each of the helper functions only added the > arguments specific to their particular docker sub-command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477941#comment-16477941 ] Jason Lowe commented on YARN-8292: -- I'm still confused about the Resources.isAnyMajorResourceZero(rc, toObtainAfterPreemption) clause in the doPreempt conditional. If we add a rarely-requested resource dimension, it is likely to be often zero in a queue's usage and therefore zero in toObtainAfterPremption. Resources.isAnyMajorResourceZero(rc, toObtainAfterPreemption) will then be always true, and that seems irrelevant to whether we want to keep preempting or not. If I understand the proposal correctly, I think the check for a zero resource can be dropped and it simplifies to the toObtainAfterPreemption component-wise max'd with zero is less than the amount to obtain from the partition (after being max'd with zero). In other words, we want to preempt as long as we have some resources we want to obtain from the partition and preempting the container makes progress on at least one of the resource dimensions being requested from the partition. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8292.001.patch > > > > This is an example of the problem: (Same if we have more than 2 resources) > > Let's say we have 3 queues A/B/C. All containers with equal size <2,3> > > ||Queue||Guaranteed||Used ||Pending|| > |A|<20, 10>|<20,30>| | > |B|<20, 10>|0|0| > |C|<20, 10>|0|<20, 30>| > | | | | | > > Under current logic, A's calculated to-preempt (how much resource other queue > can preempt) will be <0, 20>. The preemption will not happen. However, under > the context of DRC, queue A is using more resource than guaranteed, so queue > C will be starved -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8071) Add ability to specify nodemanager environment variables individually
[ https://issues.apache.org/jira/browse/YARN-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477936#comment-16477936 ] Jim Brennan commented on YARN-8071: --- [~jlowe], thanks for the review: {quote}The changes to TestContainerLaunch#testPrependDistcache appear to be unnecessary? {quote} They were intentional. When I was testing my new test case, I realized that passing the empty set for the {{nmVars}} argument leads to exceptions in {{addToEnvMap()}}, so I fixed the testPrependDistcache() cases as well - I assume this windows-only test must be failing without this fix. > Add ability to specify nodemanager environment variables individually > - > > Key: YARN-8071 > URL: https://issues.apache.org/jira/browse/YARN-8071 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-8071.001.patch, YARN-8071.002.patch, > YARN-8071.003.patch > > > YARN-6830 describes a problem where environment variables that contain commas > cannot be specified via {{-Dmapreduce.map.env}}. > For example: > {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}} > will set {{MOUNTS}} to {{/tmp/foo}} > In that Jira, [~aw] suggested that we change the API to provide a way to > specify environment variables individually, the same way that Spark does. > {quote}Rather than fight with a regex why not redefine the API instead? > > -Dmapreduce.map.env.MODE=bar > -Dmapreduce.map.env.IMAGE_NAME=foo > -Dmapreduce.map.env.MOUNTS=/tmp/foo,/tmp/bar > ... > e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar > This greatly simplifies the input validation needed and makes it clear what > is actually being defined. > {quote} > The mapreduce properties were dealt with in [MAPREDUCE-7069]. This Jira will > address the YARN properties. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477933#comment-16477933 ] genericqa commented on YARN-4599: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 21s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 5m 28s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 26m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 43s{color} | {color:green} root: The patch generated 0 new + 235 unchanged - 1 fixed = 235 total (was 236) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 48s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}191m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.util.TestDiskChecker | | | hadoop.util.TestReadWriteDiskValidator | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-4599 | | JIRA Patch URL | https:/
[jira] [Commented] (YARN-8123) Skip compiling old hamlet package when the Java version is 10 or upper
[ https://issues.apache.org/jira/browse/YARN-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477930#comment-16477930 ] Dinesh Chitlangia commented on YARN-8123: - [~tasanuma0829] - Thank you for reviewing the patch [~ajisakaa] - Thank you for committing the changes. > Skip compiling old hamlet package when the Java version is 10 or upper > -- > > Key: YARN-8123 > URL: https://issues.apache.org/jira/browse/YARN-8123 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp > Environment: Java 10 or upper >Reporter: Akira Ajisaka >Assignee: Dinesh Chitlangia >Priority: Major > Labels: newbie > Fix For: 3.2.0 > > Attachments: YARN-8123.001.patch > > > HADOOP-11423 skipped compiling old hamlet package when the Java version is 9, > however, it is not skipped with Java 10+. We need to fix it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org