[jira] [Comment Edited] (YARN-10185) Container executor fields should be volatile
[ https://issues.apache.org/jira/browse/YARN-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055383#comment-17055383 ] Denes Gerencser edited comment on YARN-10185 at 3/9/20, 9:21 PM: - Thanks [~pbacsko] for looking into this! The whole call hierarchy is so diverging, hard to track... By memory model only those threads are guaranteed to see the "flushed" values of one thread who also synchronize via the same monitor. So in case of {{ContainerExecutor.setConf()}} (which sets the value of {{exitCodeFileTimeout}}) eventually called by {{AbstractService.init()}}, all caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the value of {{exitCodeFileTimeout}}) should also use the same monitor, {{AbstractService.stateChangeLock}} in this case. Do we think that this condition is always met? (It was suspicious for me that {{ContainerExecutor.pidFiles}} in this class is a concurrent collection.) was (Author: denes.gerencser): Thanks [~pbacsko] for looking into this! The whole call hierarchy is so diverging, hard to track... By memory model only those threads are guaranteed to see the "flushed" values of one thread who also synchronize via the same monitor. So in case of {{ContainerExecutor.setConf()}} (which sets the value of {{exitCodeFileTimeout}}) eventually called by {{AbstractService.init()}}, all caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the value of {{exitCodeFileTimeout}}) should also use the same monitor, {{AbstractService.stateChangeLock}} in this case. Do we think that this condition is always met? (It was suspicious for me that {{pidFiles}} in this class is a concurrent collection.) > Container executor fields should be volatile > > > Key: YARN-10185 > URL: https://issues.apache.org/jira/browse/YARN-10185 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Denes Gerencser >Priority: Major > > In YARN-7226 and YARN-10173 there two fields have been added to the > {{ContainerExecutor}} class. These fields are set through {{#setConf()}} only > once, but in a multithreaded environment the volatile keyword should be added > to ensure that the updated fields are used. > > Related piece of code: > {code:java} > private String[] whitelistVars; > private int exitCodeFileTimeout = > YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_EXIT_FILE_TIMEOUT; > {code} > This can be hardly unit tested, but the bug could cause the UT added by > YARN-10173 to fail in a very small percentage. > Thanks for [~denes.gerencser] for finding this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10185) Container executor fields should be volatile
[ https://issues.apache.org/jira/browse/YARN-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055383#comment-17055383 ] Denes Gerencser edited comment on YARN-10185 at 3/9/20, 9:17 PM: - Thanks [~pbacsko] for looking into this! The whole call hierarchy is so diverging, hard to track... By memory model only those threads are guaranteed to see the "flushed" values of one thread who also synchronize via the same monitor. So in case of {{ContainerExecutor.setConf()}} (which sets the value of {{exitCodeFileTimeout}}) eventually called by {{AbstractService.init()}}, all caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the value of {{exitCodeFileTimeout}}) should also use the same monitor, {{AbstractService .stateChangeLock}} in this case. Do we think that this condition is always met? (It was suspicious for me that {{pidFiles}} in this class is a concurrent collection.) was (Author: denes.gerencser): Thanks [~pbacsko] for looking into this! The whole call hierarchy is so diverging, hard to track... By memory model only those threads are guaranteed to see the "flushed" values who also synchronize via the same monitor. So in case of {{ContainerExecutor.setConf()}} (which sets the value of {{exitCodeFileTimeout}}) eventually called by {{AbstractService.init()}}, all caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the value of {{exitCodeFileTimeout}}) should also use the same monitor, {{AbstractService .stateChangeLock}} in this case. Do we think that this condition is always met? (It was suspicious for me that {{pidFiles}} in this class is a concurrent collection.) > Container executor fields should be volatile > > > Key: YARN-10185 > URL: https://issues.apache.org/jira/browse/YARN-10185 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Denes Gerencser >Priority: Major > > In YARN-7226 and YARN-10173 there two fields have been added to the > {{ContainerExecutor}} class. These fields are set through {{#setConf()}} only > once, but in a multithreaded environment the volatile keyword should be added > to ensure that the updated fields are used. > > Related piece of code: > {code:java} > private String[] whitelistVars; > private int exitCodeFileTimeout = > YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_EXIT_FILE_TIMEOUT; > {code} > This can be hardly unit tested, but the bug could cause the UT added by > YARN-10173 to fail in a very small percentage. > Thanks for [~denes.gerencser] for finding this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10185) Container executor fields should be volatile
[ https://issues.apache.org/jira/browse/YARN-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055383#comment-17055383 ] Denes Gerencser edited comment on YARN-10185 at 3/9/20, 9:17 PM: - Thanks [~pbacsko] for looking into this! The whole call hierarchy is so diverging, hard to track... By memory model only those threads are guaranteed to see the "flushed" values of one thread who also synchronize via the same monitor. So in case of {{ContainerExecutor.setConf()}} (which sets the value of {{exitCodeFileTimeout}}) eventually called by {{AbstractService.init()}}, all caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the value of {{exitCodeFileTimeout}}) should also use the same monitor, {{AbstractService.stateChangeLock}} in this case. Do we think that this condition is always met? (It was suspicious for me that {{pidFiles}} in this class is a concurrent collection.) was (Author: denes.gerencser): Thanks [~pbacsko] for looking into this! The whole call hierarchy is so diverging, hard to track... By memory model only those threads are guaranteed to see the "flushed" values of one thread who also synchronize via the same monitor. So in case of {{ContainerExecutor.setConf()}} (which sets the value of {{exitCodeFileTimeout}}) eventually called by {{AbstractService.init()}}, all caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the value of {{exitCodeFileTimeout}}) should also use the same monitor, {{AbstractService .stateChangeLock}} in this case. Do we think that this condition is always met? (It was suspicious for me that {{pidFiles}} in this class is a concurrent collection.) > Container executor fields should be volatile > > > Key: YARN-10185 > URL: https://issues.apache.org/jira/browse/YARN-10185 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Denes Gerencser >Priority: Major > > In YARN-7226 and YARN-10173 there two fields have been added to the > {{ContainerExecutor}} class. These fields are set through {{#setConf()}} only > once, but in a multithreaded environment the volatile keyword should be added > to ensure that the updated fields are used. > > Related piece of code: > {code:java} > private String[] whitelistVars; > private int exitCodeFileTimeout = > YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_EXIT_FILE_TIMEOUT; > {code} > This can be hardly unit tested, but the bug could cause the UT added by > YARN-10173 to fail in a very small percentage. > Thanks for [~denes.gerencser] for finding this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10185) Container executor fields should be volatile
[ https://issues.apache.org/jira/browse/YARN-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055089#comment-17055089 ] Peter Bacsko edited comment on YARN-10185 at 3/9/20, 3:31 PM: -- I don't think that {{volatile}} is needed at all. The initialization occurs from the following code path: {noformat} ContainerExecutor.setConf(Configuration) ReflectionUtils.setConf(Object, Configuration) ReflectionUtils.newInstance(Class, Configuration) NodeManager.createContainerExecutor(Configuration) NodeManager.serviceInit(Configuration) AbstractService.init(Configuration) ... {noformat} The initialization in {{AbstractService.init()}} occurs inside a {{synchronized}} block, which happens to act as a memory fence as soon as you exit the block. That is, all pending writes and reads are flushed: https://www.infoq.com/articles/memory_barriers_jvm_concurrency/ Also, there's {{LinuxContainerExecutor}} which calls {{super.setConf()}}, essentially all the variables there should be handled too. But it's not necessary, given the JVM memory model. was (Author: pbacsko): I don't think that {{volatile}} is needed at all. The initialization occurs from the following code path: {noformat} ContainerExecutor.setConf(Configuration) ReflectionUtils.setConf(Object, Configuration) ReflectionUtils.newInstance(Class, Configuration) NodeManager.createContainerExecutor(Configuration) NodeManager.serviceInit(Configuration) AbstractService.init(Configuration) ... {noformat} The initialization in {{AbstractService.init()}} occurs inside a {{synchronized}} block, which happens to act as a memory fence as soon as you exit the block. That is, all pending writes and reads are flushed: https://www.infoq.com/articles/memory_barriers_jvm_concurrency/ Also, there's {{LinuxContainerExecutor}} which calls {{super.setConf()}}, essentially all the variables there should be handled too. But it's not necessary, given the JVM memory model. > Container executor fields should be volatile > > > Key: YARN-10185 > URL: https://issues.apache.org/jira/browse/YARN-10185 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Denes Gerencser >Priority: Major > > In YARN-7226 and YARN-10173 there two fields have been added to the > {{ContainerExecutor}} class. These fields are set through {{#setConf()}} only > once, but in a multithreaded environment the volatile keyword should be added > to ensure that the updated fields are used. > > Related piece of code: > {code:java} > private String[] whitelistVars; > private int exitCodeFileTimeout = > YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_EXIT_FILE_TIMEOUT; > {code} > This can be hardly unit tested, but the bug could cause the UT added by > YARN-10173 to fail in a very small percentage. > Thanks for [~denes.gerencser] for finding this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org