[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969919#comment-16969919 ] zhoukang commented on YARN-9537: nice catch [~yufeigu] Thanks. I agree with you that may be queue level is more flexible. I will refactor later. > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969831#comment-16969831 ] kailiu_dev edited comment on YARN-9940 at 11/8/19 6:15 AM: --- [~wilfreds] , you say right that hadoop 2.7.2 has been {color:#172b4d}EOL'ed{color}, but can you please help me review my my patch , check if can the code can {color:#172b4d}solving the issure, beause my company need hadoop 2.7.2。{color} *{color:#172b4d}in my code:{color}* for example , befor my change in hadoop 2.7.2: *sort node is:* synchronized (this) { Collections.sort(nodeIdList, nodeAvailableResourceComparator); } *but {color:#172b4d}completedContainer{color} is:* writeLock.lock(); . node.releaseContainer(container); writeLock.unlock(); *after change my code:* {color:#172b4d}completedContainer{color} is: writeLock.lock(); {color:#172b4d}if (continuousSchedulingEnabled) { {color} {color:#172b4d} synchronized (this) { {color} {color:#172b4d}node.releaseContainer(container); {color} {color:#172b4d} } }{color} {color:#172b4d}writeLock.unlock();{color} {color:#ff}so when releaseContainer , it should wait for geting schdule lock when sorting node;{color} {color:#ff}this is my simple example, the total code is in my patch.{color} {color:#172b4d}so above on you say "the method {color}{{FairScheduler.completedContainer()}}{color:#172b4d} method is already synchronised adding a synchronised block inside that will not help{color}", I can think that your meaning is that it is not helpful in hadoop2.9 beause of it has all used {color:#172b4d}writeLock/readLock,{color} but it is helpful in hadoop2.7.2 ? was (Author: kailiu_dev): [~wilfreds] , you say right that hadoop 2.7.2 has been {color:#172b4d}EOL'ed{color}, but can you please help me review my my patch , check if can the code can {color:#172b4d}solving the issure, beause my company need hadoop 2.7.2。{color} *{color:#172b4d}in my code:{color}* for example , befor my change in hadoop 2.7.2: *sort node is:* synchronized (this) { Collections.sort(nodeIdList, nodeAvailableResourceComparator); } *but {color:#172b4d}completedContainer{color} is:* writeLock.lock(); . node.releaseContainer(container); writeLock.unlock(); *after change my code:* {color:#172b4d}completedContainer{color} is: writeLock.lock(); {color:#172b4d}if (continuousSchedulingEnabled) { {color} {color:#172b4d} synchronized (this) { {color} {color:#172b4d}node.releaseContainer(container); {color} {color:#172b4d} } }{color} {color:#172b4d}writeLock.unlock();{color} {color:#ff}so when releaseContainer , it should wait for geting schdule lock when sorting node;{color} {color:#ff}this is my simple example, the total code is in my patch.{color} {color:#172b4d}so above on you say "the method {color}{{FairScheduler.completedContainer()}}{color:#172b4d} method is already synchronised adding a synchronised block inside that will not help{color}", I can think that your meaning is that it is not helpful in hadoop2.9 beause of it has all used {color:#172b4d}writeLock/readLock,{color} but it is helpful in hadoop2.7.2 ? > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) --
[jira] [Comment Edited] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969831#comment-16969831 ] kailiu_dev edited comment on YARN-9940 at 11/8/19 6:15 AM: --- [~wilfreds] , you say right that hadoop 2.7.2 has been {color:#172b4d}EOL'ed{color}, but can you please help me review my my patch , check if can the code can {color:#172b4d}solving the issure, beause my company need hadoop 2.7.2。{color} *{color:#172b4d}in my code:{color}* for example , befor my change in hadoop 2.7.2: *sort node is:* synchronized (this) { Collections.sort(nodeIdList, nodeAvailableResourceComparator); } *but {color:#172b4d}completedContainer{color} is:* writeLock.lock(); . node.releaseContainer(container); writeLock.unlock(); *after change my code:* {color:#172b4d}completedContainer{color} is: writeLock.lock(); {color:#172b4d}if (continuousSchedulingEnabled) { {color} {color:#172b4d} synchronized (this) { {color} {color:#172b4d}node.releaseContainer(container); {color} {color:#172b4d} } }{color} {color:#172b4d}writeLock.unlock();{color} {color:#ff}so when releaseContainer , it should wait for geting schdule lock when sorting node;{color} {color:#ff}this is my simple example, the total code is in my patch.{color} {color:#172b4d}so above on you say "the method {color}{{FairScheduler.completedContainer()}}{color:#172b4d} method is already synchronised adding a synchronised block inside that will not help{color}", I can think that your meaning is that it is not helpful in hadoop2.9 beause of it has all used {color:#172b4d}writeLock/readLock,{color} but it is helpful in hadoop2.7.2 ? was (Author: kailiu_dev): [~wilfreds] , you say right that hadoop 2.7.2 has been {color:#172b4d}EOL'ed{color}, but can you please help me review my my patch , check if can the code can {color:#172b4d}solving the issure, beause my company need hadoop 2.7.2。{color} *{color:#172b4d}in my code:{color}* for example , befor my change in hadoop 2.7.2: *sort node is:* synchronized (this) { Collections.sort(nodeIdList, nodeAvailableResourceComparator); } *but {color:#172b4d}completedContainer{color} is:* writeLock.lock(); . node.releaseContainer(container); writeLock.unlock(); *after change my code:* {color:#172b4d}completedContainer{color} is: writeLock.lock(); {color:#172b4d}if (continuousSchedulingEnabled) { {color} {color:#172b4d} synchronized (this) \{ {color} {color:#172b4d}node.releaseContainer(container); {color} {color:#172b4d} } }{color} {color:#172b4d}writeLock.unlock();{color} {color:#ff}so when releaseContainer , it should wait for geting schdule lock when sorting node;{color} {color:#ff}this is my simple example, the total code is in my patch.{color} {color:#172b4d}so above on you say "the method {color}{{FairScheduler.completedContainer()}}{color:#172b4d} method is already synchronised adding a synchronised block inside that will not help{color}", I can think that your meaning is that it is not helpful in hadoop2.9 beause of it has al used {color:#172b4d}writeLock/readLock,{color} but it is helpful in hadoop2.7.2 ? > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:29
[jira] [Commented] (YARN-9952) ontinuous scheduling thread crashes
[ https://issues.apache.org/jira/browse/YARN-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969833#comment-16969833 ] kailiu_dev commented on YARN-9952: -- [~wilfreds] , this is same with YARN-9940, but it is not used , I will delete it later! . sorry! > ontinuous scheduling thread crashes > --- > > Key: YARN-9952 > URL: https://issues.apache.org/jira/browse/YARN-9952 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Priority: Major > > {color:#172b4d}2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread{color}[FairSchedulerContinuousScheduling,5,main]{color:#172b4d} threw > an Exception.{color} > {color:#172b4d} java.lang.IllegalArgumentException: Comparison method > violates its general contract!{color} > {color:#172b4d} at java.util.TimSort.mergeHi(TimSort.java:868){color} > {color:#172b4d} at java.util.TimSort.mergeAt(TimSort.java:485){color} > {color:#172b4d} at > java.util.TimSort.mergeForceCollapse(TimSort.java:426){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:223){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:173){color} > {color:#172b4d} at java.util.Arrays.sort(Arrays.java:659){color} > {color:#172b4d} at > java.util.Collections.sort(Collections.java:217){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296){color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969831#comment-16969831 ] kailiu_dev edited comment on YARN-9940 at 11/8/19 4:56 AM: --- [~wilfreds] , you say right that hadoop 2.7.2 has been {color:#172b4d}EOL'ed{color}, but can you please help me review my my patch , check if can the code can {color:#172b4d}solving the issure, beause my company need hadoop 2.7.2。{color} *{color:#172b4d}in my code:{color}* for example , befor my change in hadoop 2.7.2: *sort node is:* synchronized (this) { Collections.sort(nodeIdList, nodeAvailableResourceComparator); } *but {color:#172b4d}completedContainer{color} is:* writeLock.lock(); . node.releaseContainer(container); writeLock.unlock(); *after change my code:* {color:#172b4d}completedContainer{color} is: writeLock.lock(); {color:#172b4d}if (continuousSchedulingEnabled) { {color} {color:#172b4d} synchronized (this) \{ {color} {color:#172b4d}node.releaseContainer(container); {color} {color:#172b4d} } }{color} {color:#172b4d}writeLock.unlock();{color} {color:#ff}so when releaseContainer , it should wait for geting schdule lock when sorting node;{color} {color:#ff}this is my simple example, the total code is in my patch.{color} {color:#172b4d}so above on you say "the method {color}{{FairScheduler.completedContainer()}}{color:#172b4d} method is already synchronised adding a synchronised block inside that will not help{color}", I can think that your meaning is that it is not helpful in hadoop2.9 beause of it has al used {color:#172b4d}writeLock/readLock,{color} but it is helpful in hadoop2.7.2 ? was (Author: kailiu_dev): [~wilfreds] , you say right that hadoop 2.7.2 has been {color:#172b4d}EOL'ed{color}, but can you please help me review my my patch , check if can the code can {color:#172b4d}solving the issure, beause my company need hadoop 2.7.2。{color} *{color:#172b4d}in my code:{color}* for example , befor my change in hadoop 2.7.2: *sort node is:* synchronized (this) { Collections.sort(nodeIdList, nodeAvailableResourceComparator); } *but {color:#172b4d}completedContainer{color} is:* writeLock.lock(); . node.releaseContainer(container); writeLock.unlock(); *after change my code:* {color:#172b4d}completedContainer{color} is: writeLock.lock(); {color:#172b4d}if (continuousSchedulingEnabled) { synchronized (this) \{ node.releaseContainer(container); } }{color} {color:#172b4d}writeLock.unlock();{color} {color:#ff}so when releaseContainer , it should wait for geting schdule lock when sorting node;{color} {color:#ff}this is my simple example, the total code is in my patch.{color} {color:#172b4d}so above on you say "the method {color}{{FairScheduler.completedContainer()}}{color:#172b4d} method is already synchronised adding a synchronised block inside that will not help{color}", I can think that your meaning is that it is not helpful in hadoop2.9 beause of it has al used {color:#172b4d}writeLock/readLock,{color} but it is helpful in hadoop2.7.2 ? > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4
[jira] [Comment Edited] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969831#comment-16969831 ] kailiu_dev edited comment on YARN-9940 at 11/8/19 4:55 AM: --- [~wilfreds] , you say right that hadoop 2.7.2 has been {color:#172b4d}EOL'ed{color}, but can you please help me review my my patch , check if can the code can {color:#172b4d}solving the issure, beause my company need hadoop 2.7.2。{color} *{color:#172b4d}in my code:{color}* for example , befor my change in hadoop 2.7.2: *sort node is:* synchronized (this) { Collections.sort(nodeIdList, nodeAvailableResourceComparator); } *but {color:#172b4d}completedContainer{color} is:* writeLock.lock(); . node.releaseContainer(container); writeLock.unlock(); *after change my code:* {color:#172b4d}completedContainer{color} is: writeLock.lock(); {color:#172b4d}if (continuousSchedulingEnabled) { synchronized (this) \{ node.releaseContainer(container); } }{color} {color:#172b4d}writeLock.unlock();{color} {color:#ff}so when releaseContainer , it should wait for geting schdule lock when sorting node;{color} {color:#ff}this is my simple example, the total code is in my patch.{color} {color:#172b4d}so above on you say "the method {color}{{FairScheduler.completedContainer()}}{color:#172b4d} method is already synchronised adding a synchronised block inside that will not help{color}", I can think that your meaning is that it is not helpful in hadoop2.9 beause of it has al used {color:#172b4d}writeLock/readLock,{color} but it is helpful in hadoop2.7.2 ? was (Author: kailiu_dev): [~wilfreds] , you say right that hadoop 2.7.2 has been {color:#172b4d}EOL'ed{color}, but can you please help me review my my patch , check if can the code can {color:#172b4d}solving the issure, beause my company need hadoop 2.7.2。{color} *{color:#172b4d}in my code:{color}* for example , befor my change in hadoop 2.7.2: *sort node is:* synchronized (this) { Collections.sort(nodeIdList, nodeAvailableResourceComparator); } but {color:#172b4d}completedContainer{color} is: writeLock.lock(); . node.releaseContainer(container); writeLock.unlock(); *after change my code:* {color:#172b4d}{color:#172b4d}completedContainer{color} is:{color} {color:#172b4d}writeLock.lock();{color} {color:#172b4d}if (continuousSchedulingEnabled) { synchronized (this) { node.releaseContainer(container); } }{color} {color:#172b4d}writeLock.unlock();{color} {color:#FF}so when releaseContainer , it should wait for geting schdule lock when sorting node;{color} {color:#FF}this is my simple example, the total code is in my patch.{color} {color:#172b4d}so above on you say "t{color:#172b4d}he method {color}{{FairScheduler.completedContainer()}}{color:#172b4d} method is already synchronised adding a synchronised block inside that will not help{color}", I can think that your meaning is that it is not helpful in hadoop2.9 beause of it has al used {color:#172b4d}writeLock/readLock,{color} but it is helpful in hadoop2.7.2 ?{color} {color:#172b4d} {color} > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atla
[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969831#comment-16969831 ] kailiu_dev commented on YARN-9940: -- [~wilfreds] , you say right that hadoop 2.7.2 has been {color:#172b4d}EOL'ed{color}, but can you please help me review my my patch , check if can the code can {color:#172b4d}solving the issure, beause my company need hadoop 2.7.2。{color} *{color:#172b4d}in my code:{color}* for example , befor my change in hadoop 2.7.2: *sort node is:* synchronized (this) { Collections.sort(nodeIdList, nodeAvailableResourceComparator); } but {color:#172b4d}completedContainer{color} is: writeLock.lock(); . node.releaseContainer(container); writeLock.unlock(); *after change my code:* {color:#172b4d}{color:#172b4d}completedContainer{color} is:{color} {color:#172b4d}writeLock.lock();{color} {color:#172b4d}if (continuousSchedulingEnabled) { synchronized (this) { node.releaseContainer(container); } }{color} {color:#172b4d}writeLock.unlock();{color} {color:#FF}so when releaseContainer , it should wait for geting schdule lock when sorting node;{color} {color:#FF}this is my simple example, the total code is in my patch.{color} {color:#172b4d}so above on you say "t{color:#172b4d}he method {color}{{FairScheduler.completedContainer()}}{color:#172b4d} method is already synchronised adding a synchronised block inside that will not help{color}", I can think that your meaning is that it is not helpful in hadoop2.9 beause of it has al used {color:#172b4d}writeLock/readLock,{color} but it is helpful in hadoop2.7.2 ?{color} {color:#172b4d} {color} > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969823#comment-16969823 ] Hadoop QA commented on YARN-8373: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 1s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 29 unchanged - 1 fixed = 32 total (was 30) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 34s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 40s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-8373 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985296/YARN-8373.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 285fed4a1d32 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 247584e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25121/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25121/testReport/ | | Max. process+thread count | 843 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-res
[jira] [Commented] (YARN-9952) ontinuous scheduling thread crashes
[ https://issues.apache.org/jira/browse/YARN-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969803#comment-16969803 ] Wilfred Spiegelenburg commented on YARN-9952: - [~kailiu_dev] please explain how this issue is different from YARN-9940 ? The fix is the same, the stack trace is the same this really is a duplicate... Please close this as a dupe of YARN-9940. > ontinuous scheduling thread crashes > --- > > Key: YARN-9952 > URL: https://issues.apache.org/jira/browse/YARN-9952 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Priority: Major > > {color:#172b4d}2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread{color}[FairSchedulerContinuousScheduling,5,main]{color:#172b4d} threw > an Exception.{color} > {color:#172b4d} java.lang.IllegalArgumentException: Comparison method > violates its general contract!{color} > {color:#172b4d} at java.util.TimSort.mergeHi(TimSort.java:868){color} > {color:#172b4d} at java.util.TimSort.mergeAt(TimSort.java:485){color} > {color:#172b4d} at > java.util.TimSort.mergeForceCollapse(TimSort.java:426){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:223){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:173){color} > {color:#172b4d} at java.util.Arrays.sort(Arrays.java:659){color} > {color:#172b4d} at > java.util.Collections.sort(Collections.java:217){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296){color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969797#comment-16969797 ] Miklos Szegedi commented on YARN-8373: -- Thank you [~wilfreds] for the patch. Is this patch against 2.9? I do not see readLock in [https://github.com/apache/hadoop/blob/master/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java] Just a note for the future: The main issue in all similar cases is that the consistency is required on the data not the code. Programming languages on the other hand give primitives to provide consistency through the code. That means it might make sense to have a pointer from the resource objects to the parent object and then lock that one before changing or reassigning. Also, if we already keep track the parent from the resource objects we may just need to sink them into a PriorityQueue, when changed or added. This makes the sort unnecessary. > RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH > --- > > Key: YARN-8373 > URL: https://issues.apache.org/jira/browse/YARN-8373 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.9.0 >Reporter: Girish Bhat >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: newbie > Attachments: YARN-8373.001.patch > > > > > {noformat} > sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r > 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on > 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum > 0a76a9a32a5257331741f8d5932f183 This command was run using > /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat} > This is for version 2.9.0 > > {noformat} > 2018-05-25 05:53:12,742 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai > rSchedulerContinuousScheduling, that exited unexpectedly: > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,743 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down > the resource manager. > 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: a critical thread, FairSchedulerContinuousScheduling, that exited > unexpectedly: java.lang.IllegalArgumentException: Comparison method violates > its general contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,772 ERROR > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > ExpiredTokenRemover received java.lang.InterruptedException: sleep > interrupted{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969793#comment-16969793 ] Wilfred Spiegelenburg commented on YARN-9940: - Hadoop 2.7 has been EOL'ed, no more releases or fixes. Please see here: https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969781#comment-16969781 ] kailiu_dev edited comment on YARN-9940 at 11/8/19 2:57 AM: --- Dear, [~wilfreds] , {color:#172b4d}thanks for your replay!{color} in YARN-8373{color:#172b4d}.{color} {color:#00}- synchronized (this) { + readLock.lock(); + try \{ nodeIdList = nodeTracker.sortedNodeList(nodeAvailableResourceComparator); + } finally \{ + readLock.unlock(); }{color} h3. *I think your meaning is that synchronized (this) is not same lock with the writeLock/readLock, so while sort the nodes may get exception '{color:#ff}Comparison method violates its general contract{color}' when some nodes'Available resources change , so we shoule have one lock ,our may same but my version is hadoop 2.7.2 and your hadoop version is above than 2.9 , the codes in hadoop 2.7.2 doesn't have nodeTracker, so my solution looks is not same with YARN-8373{color:#172b4d}.{color}.* was (Author: kailiu_dev): Dear, [~wilfreds] , {color:#172b4d}thanks for your replay!{color} in YARN-8373{color:#172b4d}.{color} {color:#00}- synchronized (this) { + readLock.lock(); + try { nodeIdList = nodeTracker.sortedNodeList(nodeAvailableResourceComparator); + } finally { + readLock.unlock(); }{color} h3. *I think your meaning is that synchronized (this) is not same lock with whith the writeLock/readLock, so while sort the nodes may get exception '{color:#FF}Comparison method violates its general contract{color}' when some nodes'Available resources change , so we shoule have one lock ,our may same but my version is hadoop 2.7.2, the codes in hadoop 2.7.2 doesn't have nodeTracker, so my solution looks is not same with YARN-8373{color:#172b4d}.{color}.* > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969781#comment-16969781 ] kailiu_dev commented on YARN-9940: -- Dear, [~wilfreds] , {color:#172b4d}thanks for your replay!{color} in YARN-8373{color:#172b4d}.{color} {color:#00}- synchronized (this) { + readLock.lock(); + try { nodeIdList = nodeTracker.sortedNodeList(nodeAvailableResourceComparator); + } finally { + readLock.unlock(); }{color} h3. *I think your meaning is that synchronized (this) is not same lock with whith the writeLock/readLock, so while sort the nodes may get exception '{color:#FF}Comparison method violates its general contract{color}' when some nodes'Available resources change , so we shoule have one lock ,our may same but my version is hadoop 2.7.2, the codes in hadoop 2.7.2 doesn't have nodeTracker, so my solution looks is not same with YARN-8373{color:#172b4d}.{color}.* > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969770#comment-16969770 ] Hadoop QA commented on YARN-9564: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 6m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 11s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 44s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 50s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange} 0m 7s{color} | {color:orange} The patch generated 124 new + 0 unchanged - 0 fixed = 124 total (was 0) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s{color} | {color:green} hadoop-assemblies in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 21s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}206m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.webproxy.TestWebAppProxyServlet | | | hadoop.yarn.server.webproxy.amfilter.TestAmFilter | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities | | | hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage | | | hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageSchema | | | hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity | | | hadoop.yarn.server.timelineservice.storage.TestTimelineWriterHBaseDown | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageDomain | | | hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps | | | hadoop.yarn.server.timeli
[jira] [Updated] (YARN-9952) ontinuous scheduling thread crashes
[ https://issues.apache.org/jira/browse/YARN-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kailiu_dev updated YARN-9952: - Attachment: (was: YARN-9940-branch-2.7.2.001.patch) > ontinuous scheduling thread crashes > --- > > Key: YARN-9952 > URL: https://issues.apache.org/jira/browse/YARN-9952 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Priority: Major > > {color:#172b4d}2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread{color}[FairSchedulerContinuousScheduling,5,main]{color:#172b4d} threw > an Exception.{color} > {color:#172b4d} java.lang.IllegalArgumentException: Comparison method > violates its general contract!{color} > {color:#172b4d} at java.util.TimSort.mergeHi(TimSort.java:868){color} > {color:#172b4d} at java.util.TimSort.mergeAt(TimSort.java:485){color} > {color:#172b4d} at > java.util.TimSort.mergeForceCollapse(TimSort.java:426){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:223){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:173){color} > {color:#172b4d} at java.util.Arrays.sort(Arrays.java:659){color} > {color:#172b4d} at > java.util.Collections.sort(Collections.java:217){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296){color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-9952) ontinuous scheduling thread crashes
[ https://issues.apache.org/jira/browse/YARN-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kailiu_dev reopened YARN-9952: -- > ontinuous scheduling thread crashes > --- > > Key: YARN-9952 > URL: https://issues.apache.org/jira/browse/YARN-9952 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Priority: Major > > {color:#172b4d}2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread{color}[FairSchedulerContinuousScheduling,5,main]{color:#172b4d} threw > an Exception.{color} > {color:#172b4d} java.lang.IllegalArgumentException: Comparison method > violates its general contract!{color} > {color:#172b4d} at java.util.TimSort.mergeHi(TimSort.java:868){color} > {color:#172b4d} at java.util.TimSort.mergeAt(TimSort.java:485){color} > {color:#172b4d} at > java.util.TimSort.mergeForceCollapse(TimSort.java:426){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:223){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:173){color} > {color:#172b4d} at java.util.Arrays.sort(Arrays.java:659){color} > {color:#172b4d} at > java.util.Collections.sort(Collections.java:217){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296){color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969741#comment-16969741 ] Wilfred Spiegelenburg commented on YARN-8373: - YARN-6448 introduces the synchronisation around the ClusterNodeTracker.sortedNodeList() method. That was needed due to a changed introduced by YARN-4719. The fix looks like it was build when methods were synchronised. Now going forward the change for YARN-6448 was checked in upstream in release 2.9 and later. This version also includes YARN-3139. Those changes remove the synchronised blocks and changes all locking to read/write locks. This really means that from the moment the change was added to the code it really did not do anything as it is the only synchronised block in the FS. It really only prevents two sorts from happening at the same time nothing more. The feeling I have is that the fix for YARN-6448 really never worked due to that interaction. The test that was written is not really testing the real issue. This is inside the test code: {code} synchronized (scheduler) { node.deductUnallocatedResource(Resource.newInstance(i * 1024, i)); } {code} The test uses a block that is synchronised on the scheduler while in the real code this {{deductUnallocatedResource()}} is not locked on the scheduler at all. The test should really be removed as it gives a false sense of code being tested and correct. The fix should be as simple as replacing the synchronised block with a read lock. That would bring back the fix to the state as it was intended. All the node changes like releasing containers etc run through the scheduler under a held write lock in {{attemptScheduling()} {{completedContainerInternal()}} or {{nodeUpdate()}}. Fixing the real issue: locking all the nodes while sorting or creating a deep copy of the nodes list before sorting are costly. Neither of these will be without performance impact especially in large clusters. Based on the analysis it will also not give us anything extra [~snemeth] [~miklos.szeg...@cloudera.com] can you check please? > RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH > --- > > Key: YARN-8373 > URL: https://issues.apache.org/jira/browse/YARN-8373 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.9.0 >Reporter: Girish Bhat >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: newbie > Attachments: YARN-8373.001.patch > > > > > {noformat} > sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r > 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on > 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum > 0a76a9a32a5257331741f8d5932f183 This command was run using > /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat} > This is for version 2.9.0 > > {noformat} > 2018-05-25 05:53:12,742 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai > rSchedulerContinuousScheduling, that exited unexpectedly: > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,743 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down > the resource manager. > 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: a critical thread, FairSchedulerContinuousScheduling, that exited > unexpectedly: java.lang.IllegalArgumentException: Comparison method violates > its general contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:15
[jira] [Updated] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YARN-8373: Attachment: YARN-8373.001.patch > RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH > --- > > Key: YARN-8373 > URL: https://issues.apache.org/jira/browse/YARN-8373 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.9.0 >Reporter: Girish Bhat >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: newbie > Attachments: YARN-8373.001.patch > > > > > {noformat} > sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r > 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on > 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum > 0a76a9a32a5257331741f8d5932f183 This command was run using > /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat} > This is for version 2.9.0 > > {noformat} > 2018-05-25 05:53:12,742 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai > rSchedulerContinuousScheduling, that exited unexpectedly: > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,743 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down > the resource manager. > 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: a critical thread, FairSchedulerContinuousScheduling, that exited > unexpectedly: java.lang.IllegalArgumentException: Comparison method violates > its general contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,772 ERROR > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > ExpiredTokenRemover received java.lang.InterruptedException: sleep > interrupted{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969732#comment-16969732 ] Hadoop QA commented on YARN-9561: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 15m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 66m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 26s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 16m 26s{color} | {color:red} root generated 3 new + 23 unchanged - 3 fixed = 26 total (was 26) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 15m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}166m 10s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}300m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.yarn.server.webproxy.TestWebAppProxyServlet | | | hadoop.yarn.server.webproxy.amfilter.TestAmFilter | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9561 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985269/YARN-9561.010.patch | | Optional Tests | dupname asflicense compile cc mvnsite javac unit | | uname | Linux 085fee6cfba8 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 247584e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | cc | https://builds.apache.org/job/PreCommit-YARN-Build/25116/artifact/out/diff-compile-cc-root.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/25116/artifact/out/patch-unit-root.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25116/testReport/ | | Max. process+thread count | 2752 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager . U: . | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25116/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add C changes for the new RuncContainerRuntime > -
[jira] [Commented] (YARN-9964) Queue metrics turn negative when relabeling a node with running containers to default partition
[ https://issues.apache.org/jira/browse/YARN-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969729#comment-16969729 ] Jonathan Hung commented on YARN-9964: - Hi [~maniraj...@gmail.com]/[~Naganarasimha], mind taking a look at this issue? Thanks! > Queue metrics turn negative when relabeling a node with running containers to > default partition > > > Key: YARN-9964 > URL: https://issues.apache.org/jira/browse/YARN-9964 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Priority: Major > > YARN-6467 changed queue metrics logic to only update certain metrics if it's > for default partition. But if an app runs containers in a labeled node, then > this node is moved to default partition, then the container is released, this > container's resource won't increment queue's allocated resource, but will > decrement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9964) Queue metrics turn negative when relabeling a node with running containers to default partition
Jonathan Hung created YARN-9964: --- Summary: Queue metrics turn negative when relabeling a node with running containers to default partition Key: YARN-9964 URL: https://issues.apache.org/jira/browse/YARN-9964 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung YARN-6467 changed queue metrics logic to only update certain metrics if it's for default partition. But if an app runs containers in a labeled node, then this node is moved to default partition, then the container is released, this container's resource won't increment queue's allocated resource, but will decrement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969708#comment-16969708 ] Hadoop QA commented on YARN-9564: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 2s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange} 0m 5s{color} | {color:orange} The patch generated 121 new + 0 unchanged - 0 fixed = 121 total (was 0) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s{color} | {color:green} hadoop-assemblies in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 22s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}179m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.webproxy.TestWebAppProxyServlet | | | hadoop.yarn.server.webproxy.amfilter.TestAmFilter | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageDomain | | | hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown | | | hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities | | | hadoop.yarn.server.timelineservice.storage.TestTimelineWriterHBaseDown | | | hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage | | | hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps | | | hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity | | | hadoop.yarn.server.timelines
[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969693#comment-16969693 ] Wilfred Spiegelenburg commented on YARN-9940: - BTW: this also looks more like a duplicate of YARN-8373. > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg reassigned YARN-8373: --- Assignee: Wilfred Spiegelenburg > RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH > --- > > Key: YARN-8373 > URL: https://issues.apache.org/jira/browse/YARN-8373 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.9.0 >Reporter: Girish Bhat >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: newbie > > > > {noformat} > sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r > 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on > 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum > 0a76a9a32a5257331741f8d5932f183 This command was run using > /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat} > This is for version 2.9.0 > > {noformat} > 2018-05-25 05:53:12,742 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai > rSchedulerContinuousScheduling, that exited unexpectedly: > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,743 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down > the resource manager. > 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: a critical thread, FairSchedulerContinuousScheduling, that exited > unexpectedly: java.lang.IllegalArgumentException: Comparison method violates > its general contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,772 ERROR > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > ExpiredTokenRemover received java.lang.InterruptedException: sleep > interrupted{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969691#comment-16969691 ] Wilfred Spiegelenburg commented on YARN-9940: - The stack trace does not line up with hadoop 2.7.2. The FS call to sort is located at [line 1002|https://github.com/apache/hadoop/blob/branch-2.7.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1002] in that release. Line 1117 is blank. This fix also does not look correct at all. It touches code which I think it should not touch. The method {{FairScheduler.completedContainer()}} method is already synchronised adding a synchronised block inside that will not help. The same for the {{AbstractYarnScheduler.recoverContainersOnNode()}} is synchronised. > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YARN-9940: Fix Version/s: (was: 2.7.2) > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969657#comment-16969657 ] Hadoop QA commented on YARN-9562: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 29m 58s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 9 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 43s{color} | {color:orange} root: The patch generated 23 new + 690 unchanged - 1 fixed = 713 total (was 691) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 0s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 1s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 56s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green}
[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969655#comment-16969655 ] Eric Badger commented on YARN-9562: --- bq. My initial thought would be handling these mounts via the default-rw-mounts setting in yarn-site vs hard coding it for every container. That said, I do see the challenge that poses, since the mount is inside of the container work dir in the current patch. For this initial cut, I'm fine with leaving it as is and we can open an issue to revisit. Filed YARN-9959 bq. JIRAs under the runC umbrella maybe? or do we want to try to close that out relatively quickly? I can help open the issues, I wasn't intending to put that burden on you. Filed a bunch of JIRAs under YARN-9014 :) I didn't file anything for {{reapContainer}}, because I'm not sure there is anything that needs to be done there. The runC containers are removed once they exit, so there isn't a container around that we need to cleanup. Let me know if I'm missing something. bq. Adding that check sounds like a reasonable solution to me without being intrusive. No other issues to report on usability. Added patch 006 to YARN-9564 that adds the package checks as well as a better error message for creating the root. I know it isn't the best because it's throwing exceptions instead of nice error messages, but it should be workable. > Add Java changes for the new RuncContainerRuntime > - > > Key: YARN-9562 > URL: https://issues.apache.org/jira/browse/YARN-9562 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9562.001.patch, YARN-9562.002.patch, > YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, > YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, > YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, > YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch > > > This JIRA will be used to add the Java changes for the new > RuncContainerRuntime. This will work off of YARN-9560 to use much of the > existing DockerLinuxContainerRuntime code once it is moved up into an > abstract class that can be extended. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9963) Add getIpAndHost to RuncContainerRuntime
Eric Badger created YARN-9963: - Summary: Add getIpAndHost to RuncContainerRuntime Key: YARN-9963 URL: https://issues.apache.org/jira/browse/YARN-9963 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Badger {{RuncContainerRuntime}} does not currently implement this logic, but {{DockerLinuxContainerRuntime}} does. See YARN-5430 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9961) Add execContainer logic to RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9961: -- Description: {{RuncContainerRuntime}} does not currently implement this logic, but {{DockerLinuxContainerRuntime}} does. See YARN-8776 > Add execContainer logic to RuncContainerRuntime > --- > > Key: YARN-9961 > URL: https://issues.apache.org/jira/browse/YARN-9961 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Priority: Major > > {{RuncContainerRuntime}} does not currently implement this logic, but > {{DockerLinuxContainerRuntime}} does. > See YARN-8776 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9960) Add relaunchContainer logic to RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9960: -- Description: {{RuncContainerRuntime}} does not currently implement this logic, but {{DockerLinuxContainerRuntime}} does. See YARN-7973 > Add relaunchContainer logic to RuncContainerRuntime > --- > > Key: YARN-9960 > URL: https://issues.apache.org/jira/browse/YARN-9960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Priority: Major > > {{RuncContainerRuntime}} does not currently implement this logic, but > {{DockerLinuxContainerRuntime}} does. > See YARN-7973 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9959) Work around hard-coded tmp and /var/tmp bind-mounts in the container's working directory
[ https://issues.apache.org/jira/browse/YARN-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9959: -- Description: {noformat} addRuncMountLocation(mounts, containerWorkDir.toString() + "/private_slash_tmp", "/tmp", true, true); addRuncMountLocation(mounts, containerWorkDir.toString() + "/private_var_slash_tmp", "/var/tmp", true, true); {noformat} It would be good to remove the hard-coded tmp mounts from the {{RuncContainerRuntime}} in place of something general or possibly a tmpfs. > Work around hard-coded tmp and /var/tmp bind-mounts in the container's > working directory > > > Key: YARN-9959 > URL: https://issues.apache.org/jira/browse/YARN-9959 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Priority: Major > > {noformat} > addRuncMountLocation(mounts, containerWorkDir.toString() + > "/private_slash_tmp", "/tmp", true, true); > addRuncMountLocation(mounts, containerWorkDir.toString() + > "/private_var_slash_tmp", "/var/tmp", true, true); > {noformat} > It would be good to remove the hard-coded tmp mounts from the > {{RuncContainerRuntime}} in place of something general or possibly a tmpfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9962) Add networking support other than host to RuncContainerRuntime
Eric Badger created YARN-9962: - Summary: Add networking support other than host to RuncContainerRuntime Key: YARN-9962 URL: https://issues.apache.org/jira/browse/YARN-9962 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Badger Currently, the {{RuncContainerRuntime}} only supports host networking, while the {{DockerLinuxContainerRuntime}} supports host and bridge networking -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9961) Add execContainer logic to RuncContainerRuntime
Eric Badger created YARN-9961: - Summary: Add execContainer logic to RuncContainerRuntime Key: YARN-9961 URL: https://issues.apache.org/jira/browse/YARN-9961 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Badger -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9960) Add relaunchContainer logic to RuncContainerRuntime
Eric Badger created YARN-9960: - Summary: Add relaunchContainer logic to RuncContainerRuntime Key: YARN-9960 URL: https://issues.apache.org/jira/browse/YARN-9960 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Badger -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9959) Work around hard-coded tmp and /var/tmp bind-mounts in the container's working directory
Eric Badger created YARN-9959: - Summary: Work around hard-coded tmp and /var/tmp bind-mounts in the container's working directory Key: YARN-9959 URL: https://issues.apache.org/jira/browse/YARN-9959 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Badger -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969642#comment-16969642 ] Eric Badger commented on YARN-9561: --- Patch 010 adds functionality related to the {{nobody}} user in local user mode. Details are in YARN-9562. > Add C changes for the new RuncContainerRuntime > -- > > Key: YARN-9561 > URL: https://issues.apache.org/jira/browse/YARN-9561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9561.001.patch, YARN-9561.002.patch, > YARN-9561.003.patch, YARN-9561.004.patch, YARN-9561.005.patch, > YARN-9561.006.patch, YARN-9561.007.patch, YARN-9561.008.patch, > YARN-9561.009.patch, YARN-9561.010.patch > > > This JIRA will be used to add the C changes to the container-executor native > binary that are necessary for the new RuncContainerRuntime. There should be > no changes to existing code paths. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9564: -- Attachment: YARN-9564.006.patch > Create docker-to-squash tool for image conversion > - > > Key: YARN-9564 > URL: https://issues.apache.org/jira/browse/YARN-9564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9564.001.patch, YARN-9564.002.patch, > YARN-9564.003.patch, YARN-9564.004.patch, YARN-9564.005.patch, > YARN-9564.006.patch > > > The new runc runtime uses docker images that are converted into multiple > squashfs images. Each layer of the docker image will get its own squashfs > image. We need a tool to help automate the creation of these squashfs images > when all we have is a docker image -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969634#comment-16969634 ] Eric Badger commented on YARN-9564: --- Patch 006 adds a clearer error message when the hdfs runc root has not already been created. > Create docker-to-squash tool for image conversion > - > > Key: YARN-9564 > URL: https://issues.apache.org/jira/browse/YARN-9564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9564.001.patch, YARN-9564.002.patch, > YARN-9564.003.patch, YARN-9564.004.patch, YARN-9564.005.patch, > YARN-9564.006.patch > > > The new runc runtime uses docker images that are converted into multiple > squashfs images. Each layer of the docker image will get its own squashfs > image. We need a tool to help automate the creation of these squashfs images > when all we have is a docker image -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9923) Detect missing Docker binary or not running Docker daemon
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969620#comment-16969620 ] Hadoop QA commented on YARN-9923: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 26 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 8m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 24m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 16s{color} | {color:orange} root: The patch generated 43 new + 603 unchanged - 44 fixed = 646 total (was 647) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 33s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 11 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 7s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 5 new + 0 unchanged - 0 fixed = 5 total (was 0) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 7s{color} | {color:red} hadoop-yarn-project_hadoop-yarn generated 3 new + 4196 unchanged - 0 fixed = 4199 total (was 4196) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 5s{c
[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969605#comment-16969605 ] Eric Badger commented on YARN-9564: --- Patch 005 adds dependency checking for {{tar}} and {{setfattr}}. > Create docker-to-squash tool for image conversion > - > > Key: YARN-9564 > URL: https://issues.apache.org/jira/browse/YARN-9564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9564.001.patch, YARN-9564.002.patch, > YARN-9564.003.patch, YARN-9564.004.patch, YARN-9564.005.patch > > > The new runc runtime uses docker images that are converted into multiple > squashfs images. Each layer of the docker image will get its own squashfs > image. We need a tool to help automate the creation of these squashfs images > when all we have is a docker image -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9564: -- Attachment: YARN-9564.005.patch > Create docker-to-squash tool for image conversion > - > > Key: YARN-9564 > URL: https://issues.apache.org/jira/browse/YARN-9564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9564.001.patch, YARN-9564.002.patch, > YARN-9564.003.patch, YARN-9564.004.patch, YARN-9564.005.patch > > > The new runc runtime uses docker images that are converted into multiple > squashfs images. Each layer of the docker image will get its own squashfs > image. We need a tool to help automate the creation of these squashfs images > when all we have is a docker image -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9362) Code cleanup in TestNMLeveldbStateStoreService
[ https://issues.apache.org/jira/browse/YARN-9362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969595#comment-16969595 ] Hadoop QA commented on YARN-9362: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 7 unchanged - 6 fixed = 7 total (was 13) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 28s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 55s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9362 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985271/YARN-9362.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 344dc18535d5 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 247584e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25118/testReport/ | | Max. process+thread count | 412 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25118/console | | Powered by | Apache Ye
[jira] [Commented] (YARN-9923) Detect missing Docker binary or not running Docker daemon
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969587#comment-16969587 ] Eric Badger commented on YARN-9923: --- I think this is better served in the pluggable nm health check script, but as long as the docker health check service is disabled by default and isn't added to the list on init if it isn't enabled then I'm ok with it. However, I have some reservations about the method of checking whether the docker daemon is running or not. Trying to find the pid file seems like it could end up easily being a need in a haystack problem. I would like it better if we could find a better way of determining whether the daemon is up or not. In some small testing it looks like you can run {{docker ps}} without privilege to actually access the daemon socket. If the daemon isn't running it will say that the daemon isn't running. If it is running an you don't have privilege it will log that you don't have permission to access it. Additionally, if the daemon is managed by systemd, you can run {{systemctl show --property ActiveState docker}} to get the state of the daemon. This might not be the best, since I guess docker might not be managed by systemd. Anyway, just spitballing here. I'm almost wondering if we need to handle all cases at all or if we should just try for /var/run/docker.pid and call it a day. That gets 99% of installations, and if they run with a custom configuration then they can add docker pid checking into their own pluggable health check script. > Detect missing Docker binary or not running Docker daemon > - > > Key: YARN-9923 > URL: https://issues.apache.org/jira/browse/YARN-9923 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9923.001.patch > > > Currently if a NodeManager is enabled to allocate Docker containers, but the > specified binary (docker.binary in the container-executor.cfg) is missing the > container allocation fails with the following error message: > {noformat} > Container launch fails > Exit code: 29 > Exception message: Launch container failed > Shell error output: sh: : No > such file or directory > Could not inspect docker network to get type /usr/bin/docker network inspect > host --format='{{.Driver}}'. > Error constructing docker command, docker error code=-1, error > message='Unknown error' > {noformat} > I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" > to have the following options: > - STARTUP: setting this option the NodeManager would not start if Docker > binaries are missing or the Docker daemon is not running (the exception is > considered FATAL during startup) > - RUNTIME: would give a more detailed/user-friendly exception in > NodeManager's side (NM logs) if Docker binaries are missing or the daemon is > not working. This would also prevent further Docker container allocation as > long as the binaries do not exist and the docker daemon is not running. > - NONE (default): preserving the current behaviour, throwing exception during > container allocation, carrying on using the default retry procedure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969572#comment-16969572 ] Shane Kumpf commented on YARN-9562: --- Thanks again, Eric! I'll give the latest patches a try. {quote} These variables are used in create_local_dirs. I'm not super familiar with the feature, but I was under the impression that they were not tied to any specific runtime. {quote} You are correct. I guess I've overlooked this in the past. {quote}What do you suggest as an alternative? Add both /tmp and /var/tmp as tmpfs in the runC config?{quote} My initial thought would be handling these mounts via the default-rw-mounts setting in yarn-site vs hard coding it for every container. That said, I do see the challenge that poses, since the mount is inside of the container work dir in the current patch. For this initial cut, I'm fine with leaving it as is and we can open an issue to revisit. {quote}Agreed. Should I file JIRAs for the features or add comments into the code or add documentation or what?{quote} JIRAs under the runC umbrella maybe? or do we want to try to close that out relatively quickly? I can help open the issues, I wasn't intending to put that burden on you. :) {quote} I don't want to attempt to install the packages for them. {quote} Adding that check sounds like a reasonable solution to me without being intrusive. No other issues to report on usability. > Add Java changes for the new RuncContainerRuntime > - > > Key: YARN-9562 > URL: https://issues.apache.org/jira/browse/YARN-9562 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9562.001.patch, YARN-9562.002.patch, > YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, > YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, > YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, > YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch > > > This JIRA will be used to add the Java changes for the new > RuncContainerRuntime. This will work off of YARN-9560 to use much of the > existing DockerLinuxContainerRuntime code once it is moved up into an > abstract class that can be extended. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9362) Code cleanup in TestNMLeveldbStateStoreService
[ https://issues.apache.org/jira/browse/YARN-9362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969560#comment-16969560 ] Denes Gerencser edited comment on YARN-9362 at 11/7/19 9:11 PM: Hi [~pbacsko], Thank you for reviewing. Yes, I refactored longer test cases with many asserts to more smaller ones. 1) So I left it as is. 2) Good idea, thanks. I changed it to use fixed value (but kept it as local variable instead of moving it to a class level _constant_ as it is used only in this method). Uploaded as YARN-9362.002.patch. was (Author: denes.gerencser): Hi [~pbacsko], Thank you for reviewing. Yes, I refactored longer test cases with many asserts to more smaller ones. 1) So I left it as is. 2) Good idea, thanks. I changed it to use fixed value (but kept it as local variable instead of moving it to a class level constant as it is used only in this method). Uploaded as YARN-9362.002.patch. > Code cleanup in TestNMLeveldbStateStoreService > -- > > Key: YARN-9362 > URL: https://issues.apache.org/jira/browse/YARN-9362 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Denes Gerencser >Priority: Minor > Attachments: YARN-9362.001.patch, YARN-9362.002.patch > > > There are many ways to improve TestNMLeveldbStateStoreService: > 1. RecoveredContainerState fields are asserted many times repeatedly. Some > simple method extractions would definitely make this more readable. > 2. The tests are very long and hard to read in general: Again, finding how > methods could be extracted to avoid code repetition could help. > 3. You name it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9362) Code cleanup in TestNMLeveldbStateStoreService
[ https://issues.apache.org/jira/browse/YARN-9362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969560#comment-16969560 ] Denes Gerencser commented on YARN-9362: --- Hi [~pbacsko], Thank you for reviewing. Yes, I refactored longer test cases with many asserts to more smaller ones. 1) So I left it as is. 2) Good idea, thanks. I changed it to use fixed value (but kept it as local variable instead of moving it to a class level constant as it is used only in this method). Uploaded as YARN-9362.002.patch. > Code cleanup in TestNMLeveldbStateStoreService > -- > > Key: YARN-9362 > URL: https://issues.apache.org/jira/browse/YARN-9362 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Denes Gerencser >Priority: Minor > Attachments: YARN-9362.001.patch, YARN-9362.002.patch > > > There are many ways to improve TestNMLeveldbStateStoreService: > 1. RecoveredContainerState fields are asserted many times repeatedly. Some > simple method extractions would definitely make this more readable. > 2. The tests are very long and hard to read in general: Again, finding how > methods could be extracted to avoid code repetition could help. > 3. You name it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9362) Code cleanup in TestNMLeveldbStateStoreService
[ https://issues.apache.org/jira/browse/YARN-9362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denes Gerencser updated YARN-9362: -- Attachment: YARN-9362.002.patch > Code cleanup in TestNMLeveldbStateStoreService > -- > > Key: YARN-9362 > URL: https://issues.apache.org/jira/browse/YARN-9362 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Denes Gerencser >Priority: Minor > Attachments: YARN-9362.001.patch, YARN-9362.002.patch > > > There are many ways to improve TestNMLeveldbStateStoreService: > 1. RecoveredContainerState fields are asserted many times repeatedly. Some > simple method extractions would definitely make this more readable. > 2. The tests are very long and hard to read in general: Again, finding how > methods could be extracted to avoid code repetition could help. > 3. You name it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9561) Add C changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9561: -- Attachment: YARN-9561.010.patch > Add C changes for the new RuncContainerRuntime > -- > > Key: YARN-9561 > URL: https://issues.apache.org/jira/browse/YARN-9561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9561.001.patch, YARN-9561.002.patch, > YARN-9561.003.patch, YARN-9561.004.patch, YARN-9561.005.patch, > YARN-9561.006.patch, YARN-9561.007.patch, YARN-9561.008.patch, > YARN-9561.009.patch, YARN-9561.010.patch > > > This JIRA will be used to add the C changes to the container-executor native > binary that are necessary for the new RuncContainerRuntime. There should be > no changes to existing code paths. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969547#comment-16969547 ] Eric Badger commented on YARN-9562: --- Hey [~shaneku...@gmail.com], YARN-9562 Patch 014 combined with YARN-9561 patch 010 _should_ fix the {{nobody}} user issue. I also fixed all other issues that you mentioned (except the ones I talk about below). bq. 1) Why is the keystore and truststore needed within RuncContainerExecutorConfig? These variables are used in create_local_dirs. I'm not super familiar with the feature, but I was under the impression that they were not tied to any specific runtime. So I added them in RuncContainerExecutorConfig so that they would be passed to the container-executor. All variables are passed via that JSON file for the container-executor in the case of {{--run-runc-container}} bq. 2) I'm not a big fan of hard coded mounts like this. This would also be problematic for systemd based containers where systemd expects /tmp to be a tmpfs. What do you suggest as an alternative? Add both /tmp and /var/tmp as tmpfs in the runC config? bq. 3) It would be great to track these disabled features for future implementation. Agreed. Should I file JIRAs for the features or add comments into the code or add documentation or what? bq. I'm fine with leaving reference to the patch to docker_to_squash.py for now until we have a better story, but I did need to do a few steps to get that tool working. 1) Create the hdfs runc-root as root 2) install skopeo, squashfs-tools, and attr. I should be able to fix 1). For 2), I don't want to attempt to install the packages for them. I have checks for the first two to error out early if they aren't installed. I'll add a check for attr as well. Is there anything more to do to make this more user-friendly? > Add Java changes for the new RuncContainerRuntime > - > > Key: YARN-9562 > URL: https://issues.apache.org/jira/browse/YARN-9562 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9562.001.patch, YARN-9562.002.patch, > YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, > YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, > YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, > YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch > > > This JIRA will be used to add the Java changes for the new > RuncContainerRuntime. This will work off of YARN-9560 to use much of the > existing DockerLinuxContainerRuntime code once it is moved up into an > abstract class that can be extended. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9562) Add Java changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9562: -- Attachment: YARN-9562.014.patch > Add Java changes for the new RuncContainerRuntime > - > > Key: YARN-9562 > URL: https://issues.apache.org/jira/browse/YARN-9562 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9562.001.patch, YARN-9562.002.patch, > YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, > YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, > YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, > YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch > > > This JIRA will be used to add the Java changes for the new > RuncContainerRuntime. This will work off of YARN-9560 to use much of the > existing DockerLinuxContainerRuntime code once it is moved up into an > abstract class that can be extended. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9923) Detect missing Docker binary or not running Docker daemon
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969482#comment-16969482 ] Adam Antal commented on YARN-9923: -- Uploaded [^YARN-9923.001.patch] with the main idea and for a jenkins run. Will refine it a bit more and also add a test. > Detect missing Docker binary or not running Docker daemon > - > > Key: YARN-9923 > URL: https://issues.apache.org/jira/browse/YARN-9923 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9923.001.patch > > > Currently if a NodeManager is enabled to allocate Docker containers, but the > specified binary (docker.binary in the container-executor.cfg) is missing the > container allocation fails with the following error message: > {noformat} > Container launch fails > Exit code: 29 > Exception message: Launch container failed > Shell error output: sh: : No > such file or directory > Could not inspect docker network to get type /usr/bin/docker network inspect > host --format='{{.Driver}}'. > Error constructing docker command, docker error code=-1, error > message='Unknown error' > {noformat} > I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" > to have the following options: > - STARTUP: setting this option the NodeManager would not start if Docker > binaries are missing or the Docker daemon is not running (the exception is > considered FATAL during startup) > - RUNTIME: would give a more detailed/user-friendly exception in > NodeManager's side (NM logs) if Docker binaries are missing or the daemon is > not working. This would also prevent further Docker container allocation as > long as the binaries do not exist and the docker daemon is not running. > - NONE (default): preserving the current behaviour, throwing exception during > container allocation, carrying on using the default retry procedure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9923) Detect missing Docker binary or not running Docker daemon
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-9923: - Attachment: YARN-9923.001.patch > Detect missing Docker binary or not running Docker daemon > - > > Key: YARN-9923 > URL: https://issues.apache.org/jira/browse/YARN-9923 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9923.001.patch > > > Currently if a NodeManager is enabled to allocate Docker containers, but the > specified binary (docker.binary in the container-executor.cfg) is missing the > container allocation fails with the following error message: > {noformat} > Container launch fails > Exit code: 29 > Exception message: Launch container failed > Shell error output: sh: : No > such file or directory > Could not inspect docker network to get type /usr/bin/docker network inspect > host --format='{{.Driver}}'. > Error constructing docker command, docker error code=-1, error > message='Unknown error' > {noformat} > I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" > to have the following options: > - STARTUP: setting this option the NodeManager would not start if Docker > binaries are missing or the Docker daemon is not running (the exception is > considered FATAL during startup) > - RUNTIME: would give a more detailed/user-friendly exception in > NodeManager's side (NM logs) if Docker binaries are missing or the daemon is > not working. This would also prevent further Docker container allocation as > long as the binaries do not exist and the docker daemon is not running. > - NONE (default): preserving the current behaviour, throwing exception during > container allocation, carrying on using the default retry procedure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969481#comment-16969481 ] Adam Antal commented on YARN-9011: -- Great! +1 (non-binding), hope to see this soon committed. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, > YARN-9011-006.patch, YARN-9011-007.patch, YARN-9011-008.patch, > YARN-9011-009.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING > {noformat} > When the decommissioning succeeds, there is no output logged from > {{ResourceTrackerService}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9886) Queue mapping based on userid passed through application tag
[ https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969474#comment-16969474 ] Hadoop QA commented on YARN-9886: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 44s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 10s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 6 new + 313 unchanged - 0 fixed = 319 total (was 313) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 1s{color} | {color:red} The patch has 1 ill-formed XML file(s). {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 13m 10s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 53s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 55s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 50s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 39s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}100m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | XML | Parsing Error(s): | | | hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml | | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.api.TestResourcePBImpl | | | hadoop.yarn.TestContainerLaunchRPC | | | hadoop.yarn.util.resource.TestResourceCalculator | | | hadoop.yarn.webapp.util.TestWebAppUtils | | | hadoop.yarn.client.api.impl.TestTimelineClientV2Impl | | | hadoop.yarn.client.TestClientRMProxy | | |
[jira] [Comment Edited] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969450#comment-16969450 ] Yufei Gu edited comment on YARN-9537 at 11/7/19 5:41 PM: - Hi [~cane], sorry to come late. Patch 003 looks good to me overall. Just think aloud, why this property is cluster level instead of queue level? There are minor issues. # {{protected static final String AM_PREEMPTION = CONF_PREFIX + "am.preemption";}} There are two spaces between "String" and "AM_PREEMPTION" # Do we need this comment? Probably not. {code:java} // For test this.enableAMPreemption = scheduler.getConf().getAMPreemptionEnabled(); {code} # {{ public void testDisableAMPreemption() throws Exception }} No need to throw. was (Author: yufeigu): Hi [~cane], sorry to come late. Patch 003 looks good to me overall. Just think aloud, why this property is cluster level instead of queue level? There are style issues. # {{protected static final String AM_PREEMPTION = CONF_PREFIX + "am.preemption";}} There are two spaces between "String" and "AM_PREEMPTION" # Do we need this comment? Probably not. {code:java} // For test this.enableAMPreemption = scheduler.getConf().getAMPreemptionEnabled(); {code} > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969450#comment-16969450 ] Yufei Gu commented on YARN-9537: Hi [~cane], sorry to come late. Patch 003 looks good to me overall. Just think aloud, why this property is cluster level instead of queue level? There are style issues. # {{protected static final String AM_PREEMPTION = CONF_PREFIX + "am.preemption";}} There are two spaces between "String" and "AM_PREEMPTION" # Do we need this comment? Probably not. {code:java} // For test this.enableAMPreemption = scheduler.getConf().getAMPreemptionEnabled(); {code} > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9912) Support u:user2:%secondary_group queue mapping
[ https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9912: --- Attachment: YARN-9912.002.patch > Support u:user2:%secondary_group queue mapping > -- > > Key: YARN-9912 > URL: https://issues.apache.org/jira/browse/YARN-9912 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9912.001.patch, YARN-9912.002.patch > > > Similar to u:user2:%primary_group mapping, add support for > u:user2:%secondary_group queue mapping as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9912) Support u:user2:%secondary_group queue mapping
[ https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969447#comment-16969447 ] Manikandan R commented on YARN-9912: Attached .002.patch with this specific Jira doc change and also in general. > Support u:user2:%secondary_group queue mapping > -- > > Key: YARN-9912 > URL: https://issues.apache.org/jira/browse/YARN-9912 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9912.001.patch > > > Similar to u:user2:%primary_group mapping, add support for > u:user2:%secondary_group queue mapping as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9951) Unify Error Messages in container-executor
[ https://issues.apache.org/jira/browse/YARN-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969446#comment-16969446 ] David Mollitor commented on YARN-9951: -- [~szegedim] Can you please take a peek at this one too? :) > Unify Error Messages in container-executor > -- > > Key: YARN-9951 > URL: https://issues.apache.org/jira/browse/YARN-9951 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9951.1.patch > > > [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c] > > Has several different ways for reporting errors: > > # Couldn't > # Can't > # Could not > # Failed to > # Unable to > # Other > > I think "Failed to" is the best verbage. Contractions are hard for > non-native English speaking folks. "Failed" is to the point. and more likely > to grep logs for 'fail' than I am 'unable' or 'could not'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9886) Queue mapping based on userid passed through application tag
[ https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton updated YARN-9886: --- Attachment: YARN-9886.002.patch > Queue mapping based on userid passed through application tag > > > Key: YARN-9886 > URL: https://issues.apache.org/jira/browse/YARN-9886 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > Attachments: YARN-9886-WIP.patch, YARN-9886.001.patch, > YARN-9886.002.patch > > > There are situations when the real submitting user differs from the user what > arrives to YARN. For example in case of a Hive application when Hive > impersonation is turned off, the hive queries will run as Hive user and the > mapping is done based on this username. Unfortunately in this case YARN > doesn't have any information about the real user and there are cases when the > customer may want to map these applications to the real submitting user's > queue instead of the Hive queue. > For these cases, if they would pass the username in the application tag we > may read it and use it during the queue mapping, if that user has rights to > run on the real user's queue. > [~sunilg] please correct me if I missed something. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9868) Validate %primary_group queue in CS queue manager
[ https://issues.apache.org/jira/browse/YARN-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969367#comment-16969367 ] Manikandan R commented on YARN-9868: [~pbacsko] Even after rebasing it fails. Can you check at your end after apply dependent patches? > Validate %primary_group queue in CS queue manager > - > > Key: YARN-9868 > URL: https://issues.apache.org/jira/browse/YARN-9868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9868.001.patch, YARN-9868.002.patch > > > As part of %secondary_group mapping, we ensure o/p of %secondary_group while > processing the queue mapping is available using CSQueueManager. Similarly, we > will need to same for %primary_group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969366#comment-16969366 ] Manikandan R commented on YARN-9865: [~snemeth] Can you check v5 patch and commit this? > Capacity scheduler: add support for combined %user + %secondary_group mapping > - > > Key: YARN-9865 > URL: https://issues.apache.org/jira/browse/YARN-9865 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9865-005.patch, YARN-9865.001.patch, > YARN-9865.002.patch, YARN-9865.003.patch, YARN-9865.004.patch > > > Similiar to YARN-9841, but for secondary group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9866) u:user2:%primary_group is not working as expected
[ https://issues.apache.org/jira/browse/YARN-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969365#comment-16969365 ] Manikandan R commented on YARN-9866: [~pbacsko] [~snemeth] Can you please take a look? > u:user2:%primary_group is not working as expected > - > > Key: YARN-9866 > URL: https://issues.apache.org/jira/browse/YARN-9866 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9866.001.patch > > > Please refer #1 in > https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024 > for more details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9890) [UI2] Add Application tag to the app table and app detail page.
[ https://issues.apache.org/jira/browse/YARN-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969354#comment-16969354 ] Kinga Marton commented on YARN-9890: Thank you [~snemeth] for checking this parch! The new column is not displayed in the app attempts table. > [UI2] Add Application tag to the app table and app detail page. > --- > > Key: YARN-9890 > URL: https://issues.apache.org/jira/browse/YARN-9890 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > Attachments: UI2_ApplicationTag.png, YARN-9890.001.patch > > > Right now AFAIK there is no possibility to filter the applications based on > the application tag in the UI. Adding this new column to the app table will > make this filtering possible as well. > From the UI2 this information is missing from the application detail page as > well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969332#comment-16969332 ] Hadoop QA commented on YARN-9011: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 51s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 5s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 4s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}201m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9011 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985199/YARN-9011-009.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux fc77a6870914 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dd90025 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25113/testReport/ | | Max. process+thread count | 1348 (vs. ulimit of 5
[jira] [Commented] (YARN-9877) Intermittent TIME_OUT of LogAggregationReport
[ https://issues.apache.org/jira/browse/YARN-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969209#comment-16969209 ] Peter Bacsko commented on YARN-9877: [~adam.antal] to understand the issue better, I have a question: "suppose the AM is requesting more containers, but as soon as they're allocated - the AM realizes it doesn't need them". When does this "realization" occur? Under which circumstances? I can think of two: 1. speculative execution 2. reducer preemption, because we have to re-run failed mappers and there's no free resource in the cluster. Anything else? How can a sleep job trigger this problem? > Intermittent TIME_OUT of LogAggregationReport > - > > Key: YARN-9877 > URL: https://issues.apache.org/jira/browse/YARN-9877 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, resourcemanager, yarn >Affects Versions: 3.0.3, 3.3.0, 3.2.1, 3.1.3 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9877.001.patch > > > I noticed some intermittent TIME_OUT in some downstream log-aggregation based > tests. > Steps to reproduce: > - Let's run a MR job > {code} > hadoop jar hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep > -Dmapreduce.job.queuename=root.default -m 10 -r 10 -mt 5000 -rt 5000 > {code} > - Suppose the AM is requesting more containers, but as soon as they're > allocated - the AM realizes it doesn't need them. The container's state > changes are: ALLOCATED -> ACQUIRED -> RELEASED. > Let's suppose these extra containers are allocated in a different node from > the other 21 (AM + 10 mapper + 10 reducer) containers' node. > - All the containers finish successfully and the app is finished successfully > as well. Log aggregation status for the whole app seemingly stucks in RUNNING > state. > - After a while the final log aggregation status for the app changes to > TIME_OUT. > Root cause: > - As unused containers are getting through the state transition in the RM's > internal representation, {{RMAppImpl$AppRunningOnNodeTransition}}'s > transition function is called. This calls the > {{RMAppLogAggregation$addReportIfNecessary}} which forcefully adds the > "NOT_START" LogAggregationStatus associated with this NodeId for the app, > even though it does not have any running container on it. > - The node's LogAggregationStatus is never updated to "SUCCEEDED" by the > NodeManager because it does not have any running container on it (Note that > the AM immediately released them after acquisition). The LogAggregationStatus > remains NOT_START until time out is reached. After that point the RM > aggregates the LogAggregationReports for all the nodes, and though all the > containers have SUCCEEDED state, one particular node has NOT_START, so the > final log aggregation will be TIME_OUT. > (I crawled the RM UI for the log aggregation statuses, and it was always > NOT_START for this particular node). > This situation is highly unlikely, but has an estimated ~0.8% of failure rate > based on a year's 1500 run on an unstressed cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9877) Intermittent TIME_OUT of LogAggregationReport
[ https://issues.apache.org/jira/browse/YARN-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969209#comment-16969209 ] Peter Bacsko edited comment on YARN-9877 at 11/7/19 12:14 PM: -- [~adam.antal] to understand the issue better, I have a question: _"suppose the AM is requesting more containers, but as soon as they're allocated - the AM realizes it doesn't need them"_. When does this "realization" occur? Under which circumstances? I can think of two: 1. speculative execution 2. reducer preemption, because we have to re-run failed mappers and there's no free resource in the cluster. Anything else? How can a sleep job trigger this problem? was (Author: pbacsko): [~adam.antal] to understand the issue better, I have a question: "suppose the AM is requesting more containers, but as soon as they're allocated - the AM realizes it doesn't need them". When does this "realization" occur? Under which circumstances? I can think of two: 1. speculative execution 2. reducer preemption, because we have to re-run failed mappers and there's no free resource in the cluster. Anything else? How can a sleep job trigger this problem? > Intermittent TIME_OUT of LogAggregationReport > - > > Key: YARN-9877 > URL: https://issues.apache.org/jira/browse/YARN-9877 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, resourcemanager, yarn >Affects Versions: 3.0.3, 3.3.0, 3.2.1, 3.1.3 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9877.001.patch > > > I noticed some intermittent TIME_OUT in some downstream log-aggregation based > tests. > Steps to reproduce: > - Let's run a MR job > {code} > hadoop jar hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep > -Dmapreduce.job.queuename=root.default -m 10 -r 10 -mt 5000 -rt 5000 > {code} > - Suppose the AM is requesting more containers, but as soon as they're > allocated - the AM realizes it doesn't need them. The container's state > changes are: ALLOCATED -> ACQUIRED -> RELEASED. > Let's suppose these extra containers are allocated in a different node from > the other 21 (AM + 10 mapper + 10 reducer) containers' node. > - All the containers finish successfully and the app is finished successfully > as well. Log aggregation status for the whole app seemingly stucks in RUNNING > state. > - After a while the final log aggregation status for the app changes to > TIME_OUT. > Root cause: > - As unused containers are getting through the state transition in the RM's > internal representation, {{RMAppImpl$AppRunningOnNodeTransition}}'s > transition function is called. This calls the > {{RMAppLogAggregation$addReportIfNecessary}} which forcefully adds the > "NOT_START" LogAggregationStatus associated with this NodeId for the app, > even though it does not have any running container on it. > - The node's LogAggregationStatus is never updated to "SUCCEEDED" by the > NodeManager because it does not have any running container on it (Note that > the AM immediately released them after acquisition). The LogAggregationStatus > remains NOT_START until time out is reached. After that point the RM > aggregates the LogAggregationReports for all the nodes, and though all the > containers have SUCCEEDED state, one particular node has NOT_START, so the > final log aggregation will be TIME_OUT. > (I crawled the RM UI for the log aggregation statuses, and it was always > NOT_START for this particular node). > This situation is highly unlikely, but has an estimated ~0.8% of failure rate > based on a year's 1500 run on an unstressed cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969193#comment-16969193 ] Peter Bacsko commented on YARN-9011: [~adam.antal] thanks for the comments. _"[...] renaming HostsFileReader$refresh(String,String,boolean) to refreshInternal. Could you please do that to make that class more clear?"_ Done _"but I am assured that the internal structure will not get damaged by this."_ There's no danger there. Doing multiple lazy refresh isn't an issue. _"why did you move the following line inside ResourceTrackerService$nodeHeartbeat."_ Ah, that's a remnant from previous patches. Moved back. Also did the null-check improvement. > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, > YARN-9011-006.patch, YARN-9011-007.patch, YARN-9011-008.patch, > YARN-9011-009.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING > {noformat} > When the decommissioning succeeds, there is no output logged from > {{ResourceTrackerService}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9011) Race condition during decommissioning
[ https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9011: --- Attachment: YARN-9011-009.patch > Race condition during decommissioning > - > > Key: YARN-9011 > URL: https://issues.apache.org/jira/browse/YARN-9011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.1.1 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9011-001.patch, YARN-9011-002.patch, > YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, > YARN-9011-006.patch, YARN-9011-007.patch, YARN-9011-008.patch, > YARN-9011-009.patch > > > During internal testing, we found a nasty race condition which occurs during > decommissioning. > Node manager, incorrect behaviour: > {noformat} > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:00:17,634 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 > hostname:node-6.hostname.com > {noformat} > Node manager, expected behaviour: > {noformat} > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received > SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting > down. > 2018-06-18 21:07:37,377 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from > ResourceManager: DECOMMISSIONING node-6.hostname.com:8041 is ready to be > decommissioned > {noformat} > Note the two different messages from the RM ("Disallowed NodeManager" vs > "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an > inconsistent state of nodes while they're being updated: > {noformat} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader > include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219} > exclude:{node-6.hostname.com} > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully > decommission node node-6.hostname.com:8041 with state RUNNING > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: > node-6.hostname.com > 2018-06-18 21:00:17,576 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node > node-6.hostname.com:8041 in DECOMMISSIONING. > 2018-06-18 21:00:17,575 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn > IP=172.26.22.115OPERATION=refreshNodes TARGET=AdminService > RESULT=SUCCESS > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve > original total capability: > 2018-06-18 21:00:17,577 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING > {noformat} > When the decommissioning succeeds, there is no output logged from > {{ResourceTrackerService}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9958) Remove the invalid lock in ContainerExecutor
Wanqiang Ji created YARN-9958: - Summary: Remove the invalid lock in ContainerExecutor Key: YARN-9958 URL: https://issues.apache.org/jira/browse/YARN-9958 Project: Hadoop YARN Issue Type: Improvement Reporter: Wanqiang Ji Assignee: Wanqiang Ji ContainerExecutor has ReadLock and WriteLock. These used to call get/put method of ConcurrentMap. Due to the ConcurrentMap providing thread safety and atomicity guarantees, so we can remove the lock. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969028#comment-16969028 ] Peter Bacsko commented on YARN-9930: [~epayne] [~cane] I'm a bit lost here. This setting currently doesn't exist in CS, so how could this cause a confusion? We can implement it the way we want - it could work the same way as it does in FS but also it could be different. If we want FS-style behavior (and I vote for this) then just go for it. Am I missing something? > Support max running app logic for CapacityScheduler > --- > > Key: YARN-9930 > URL: https://issues.apache.org/jira/browse/YARN-9930 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.1.0, 3.1.1 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In FairScheduler, there has limitation for max running which will let > application pending. > But in CapacityScheduler there has no feature like max running app.Only got > max app,and jobs will be rejected directly on client. > This jira i want to implement this semantic for CapacityScheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org