[jira] [Comment Edited] (YARN-10185) Container executor fields should be volatile

2020-03-09 Thread Denes Gerencser (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055383#comment-17055383
 ] 

Denes Gerencser edited comment on YARN-10185 at 3/9/20, 9:21 PM:
-

Thanks [~pbacsko] for looking into this! The whole call hierarchy is so 
diverging, hard to track...
By memory model only those threads are guaranteed to see the "flushed" values 
of one thread who also synchronize via the same monitor. So in case of 
{{ContainerExecutor.setConf()}} (which sets the value of 
{{exitCodeFileTimeout}})  eventually called by {{AbstractService.init()}}, all 
caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the 
value of {{exitCodeFileTimeout}}) should also use the same monitor, 
{{AbstractService.stateChangeLock}} in this case. Do we think that this 
condition is always met? (It was suspicious for me that 
{{ContainerExecutor.pidFiles}} in this class is a concurrent collection.)



was (Author: denes.gerencser):
Thanks [~pbacsko] for looking into this! The whole call hierarchy is so 
diverging, hard to track...
By memory model only those threads are guaranteed to see the "flushed" values 
of one thread who also synchronize via the same monitor. So in case of 
{{ContainerExecutor.setConf()}} (which sets the value of 
{{exitCodeFileTimeout}})  eventually called by {{AbstractService.init()}}, all 
caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the 
value of {{exitCodeFileTimeout}}) should also use the same monitor, 
{{AbstractService.stateChangeLock}} in this case. Do we think that this 
condition is always met? (It was suspicious for me that {{pidFiles}} in this 
class is a concurrent collection.)


> Container executor fields should be volatile
> 
>
> Key: YARN-10185
> URL: https://issues.apache.org/jira/browse/YARN-10185
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Denes Gerencser
>Priority: Major
>
> In YARN-7226 and YARN-10173 there two fields have been added to the 
> {{ContainerExecutor}} class. These fields are set through {{#setConf()}} only 
> once, but in a multithreaded environment the volatile keyword should be added 
> to ensure that the updated fields are used.
>  
> Related piece of code:
> {code:java}
> private String[] whitelistVars;
>   private int exitCodeFileTimeout =
>   YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_EXIT_FILE_TIMEOUT;
> {code}
> This can be hardly unit tested, but the bug could cause the UT added by 
> YARN-10173 to fail in a very small percentage.
> Thanks for [~denes.gerencser] for finding this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10185) Container executor fields should be volatile

2020-03-09 Thread Denes Gerencser (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055383#comment-17055383
 ] 

Denes Gerencser edited comment on YARN-10185 at 3/9/20, 9:17 PM:
-

Thanks [~pbacsko] for looking into this! The whole call hierarchy is so 
diverging, hard to track...
By memory model only those threads are guaranteed to see the "flushed" values 
of one thread who also synchronize via the same monitor. So in case of 
{{ContainerExecutor.setConf()}} (which sets the value of 
{{exitCodeFileTimeout}})  eventually called by {{AbstractService.init()}}, all 
caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the 
value of {{exitCodeFileTimeout}}) should also use the same monitor, 
{{AbstractService .stateChangeLock}} in this case. Do we think that this 
condition is always met? (It was suspicious for me that {{pidFiles}} in this 
class is a concurrent collection.)



was (Author: denes.gerencser):
Thanks [~pbacsko] for looking into this! The whole call hierarchy is so 
diverging, hard to track...
By memory model only those threads are guaranteed to see the "flushed" values 
who also synchronize via the same monitor. So in case of 
{{ContainerExecutor.setConf()}} (which sets the value of 
{{exitCodeFileTimeout}})  eventually called by {{AbstractService.init()}}, all 
caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the 
value of {{exitCodeFileTimeout}}) should also use the same monitor, 
{{AbstractService .stateChangeLock}} in this case. Do we think that this 
condition is always met? (It was suspicious for me that {{pidFiles}} in this 
class is a concurrent collection.)


> Container executor fields should be volatile
> 
>
> Key: YARN-10185
> URL: https://issues.apache.org/jira/browse/YARN-10185
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Denes Gerencser
>Priority: Major
>
> In YARN-7226 and YARN-10173 there two fields have been added to the 
> {{ContainerExecutor}} class. These fields are set through {{#setConf()}} only 
> once, but in a multithreaded environment the volatile keyword should be added 
> to ensure that the updated fields are used.
>  
> Related piece of code:
> {code:java}
> private String[] whitelistVars;
>   private int exitCodeFileTimeout =
>   YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_EXIT_FILE_TIMEOUT;
> {code}
> This can be hardly unit tested, but the bug could cause the UT added by 
> YARN-10173 to fail in a very small percentage.
> Thanks for [~denes.gerencser] for finding this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10185) Container executor fields should be volatile

2020-03-09 Thread Denes Gerencser (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055383#comment-17055383
 ] 

Denes Gerencser edited comment on YARN-10185 at 3/9/20, 9:17 PM:
-

Thanks [~pbacsko] for looking into this! The whole call hierarchy is so 
diverging, hard to track...
By memory model only those threads are guaranteed to see the "flushed" values 
of one thread who also synchronize via the same monitor. So in case of 
{{ContainerExecutor.setConf()}} (which sets the value of 
{{exitCodeFileTimeout}})  eventually called by {{AbstractService.init()}}, all 
caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the 
value of {{exitCodeFileTimeout}}) should also use the same monitor, 
{{AbstractService.stateChangeLock}} in this case. Do we think that this 
condition is always met? (It was suspicious for me that {{pidFiles}} in this 
class is a concurrent collection.)



was (Author: denes.gerencser):
Thanks [~pbacsko] for looking into this! The whole call hierarchy is so 
diverging, hard to track...
By memory model only those threads are guaranteed to see the "flushed" values 
of one thread who also synchronize via the same monitor. So in case of 
{{ContainerExecutor.setConf()}} (which sets the value of 
{{exitCodeFileTimeout}})  eventually called by {{AbstractService.init()}}, all 
caller threads of {{ContainerExecutor.reacquireContainer()}} (it reads the 
value of {{exitCodeFileTimeout}}) should also use the same monitor, 
{{AbstractService .stateChangeLock}} in this case. Do we think that this 
condition is always met? (It was suspicious for me that {{pidFiles}} in this 
class is a concurrent collection.)


> Container executor fields should be volatile
> 
>
> Key: YARN-10185
> URL: https://issues.apache.org/jira/browse/YARN-10185
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Denes Gerencser
>Priority: Major
>
> In YARN-7226 and YARN-10173 there two fields have been added to the 
> {{ContainerExecutor}} class. These fields are set through {{#setConf()}} only 
> once, but in a multithreaded environment the volatile keyword should be added 
> to ensure that the updated fields are used.
>  
> Related piece of code:
> {code:java}
> private String[] whitelistVars;
>   private int exitCodeFileTimeout =
>   YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_EXIT_FILE_TIMEOUT;
> {code}
> This can be hardly unit tested, but the bug could cause the UT added by 
> YARN-10173 to fail in a very small percentage.
> Thanks for [~denes.gerencser] for finding this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10185) Container executor fields should be volatile

2020-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055089#comment-17055089
 ] 

Peter Bacsko edited comment on YARN-10185 at 3/9/20, 3:31 PM:
--

I don't think that {{volatile}} is needed at all. The initialization occurs 
from the following code path:

{noformat}
ContainerExecutor.setConf(Configuration)
ReflectionUtils.setConf(Object, Configuration)
ReflectionUtils.newInstance(Class, Configuration)
NodeManager.createContainerExecutor(Configuration)
NodeManager.serviceInit(Configuration)
AbstractService.init(Configuration)
...
{noformat}

The initialization in {{AbstractService.init()}} occurs inside a 
{{synchronized}} block, which happens to act as a memory fence as soon as you 
exit the block. That is, all pending writes and reads are flushed: 
https://www.infoq.com/articles/memory_barriers_jvm_concurrency/

Also, there's {{LinuxContainerExecutor}} which calls {{super.setConf()}}, 
essentially all the variables there should be handled too. But it's not 
necessary, given the JVM memory model.


was (Author: pbacsko):
I don't think that {{volatile}} is needed at all. The initialization occurs 
from the following code path:

{noformat}
ContainerExecutor.setConf(Configuration)
ReflectionUtils.setConf(Object, Configuration)
ReflectionUtils.newInstance(Class, Configuration)
NodeManager.createContainerExecutor(Configuration)
NodeManager.serviceInit(Configuration)
AbstractService.init(Configuration)
...
{noformat}

The initialization in {{AbstractService.init()}} occurs inside a 
{{synchronized}} block, which happens to act as a memory fence as soon as you 
exit the block. That is, all pending writes and reads are flushed: 
https://www.infoq.com/articles/memory_barriers_jvm_concurrency/

Also, there's {{LinuxContainerExecutor}} which calls {{super.setConf()}}, 
essentially all the variables there should be handled too. But it's not 
necessary, given the JVM memory model.

 

 

> Container executor fields should be volatile
> 
>
> Key: YARN-10185
> URL: https://issues.apache.org/jira/browse/YARN-10185
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Denes Gerencser
>Priority: Major
>
> In YARN-7226 and YARN-10173 there two fields have been added to the 
> {{ContainerExecutor}} class. These fields are set through {{#setConf()}} only 
> once, but in a multithreaded environment the volatile keyword should be added 
> to ensure that the updated fields are used.
>  
> Related piece of code:
> {code:java}
> private String[] whitelistVars;
>   private int exitCodeFileTimeout =
>   YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_EXIT_FILE_TIMEOUT;
> {code}
> This can be hardly unit tested, but the bug could cause the UT added by 
> YARN-10173 to fail in a very small percentage.
> Thanks for [~denes.gerencser] for finding this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org