[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-08-06 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-6728:
--
Fix Version/s: (was: 2.7.4)
   (was: 2.9.0)

> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Attachments: YARN-6728.patch.00_branch-2.7
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase.  (Note: 
> log-aggregation is enable. )
> Container runs in nodemanager will invoke initApp(), then invoke 
> verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit 
> the defaultFs. So the container will be stuck here. Then application will run 
> slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-23 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-6728:
--
Attachment: YARN-6728.patch.00_branch-2.7

> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Fix For: 2.9.0, 2.7.4
>
> Attachments: YARN-6728.patch.00_branch-2.7
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase.  (Note: 
> log-aggregation is enable. )
> Container runs in nodemanager will invoke initApp(), then invoke 
> verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit 
> the defaultFs. So the container will be stuck here. Then application will run 
> slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-22 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-6728:
--
Description: 
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase.  (Note: 
log-aggregation is enable. )

Container runs in nodemanager will invoke initApp(), then invoke 
verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit 
the defaultFs. So the container will be stuck here. Then application will run 
slow.

  was:
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase.  (Note: 
log-aggregation is enable. )

Container runs in nodemanager will invoke initApp(), then invoke 
verifyAndCreateRemoteLogDir and mkdir remote log. So the container will be 
stuck here. Then application will run slow.


> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Fix For: 2.9.0, 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase.  (Note: 
> log-aggregation is enable. )
> Container runs in nodemanager will invoke initApp(), then invoke 
> verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit 
> the defaultFs. So the container will be stuck here. Then application will run 
> slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-22 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-6728:
--
Description: 
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase.  (Note: 
log-aggregation is enable. )

Container runs in nodemanager will invoke initApp(), then invoke 
verifyAndCreateRemoteLogDir and mkdir remote log. So the container will be 
stuck here. Then application will run slow.

  was:
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation 
is enable. )




> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Fix For: 2.9.0, 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase.  (Note: 
> log-aggregation is enable. )
> Container runs in nodemanager will invoke initApp(), then invoke 
> verifyAndCreateRemoteLogDir and mkdir remote log. So the container will be 
> stuck here. Then application will run slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-22 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-6728:
--
Description: 
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation 
is enable. )



  was:
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase. (Note: we )




> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Fix For: 2.9.0, 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase. (Note: 
> log-aggregation is enable. )



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-22 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-6728:
--
Description: 
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase. (Note: we )



  was:Job will run slow when the performance of defaultFs degrades and the 
log-aggregation is enable. 


> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Fix For: 2.9.0, 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase. (Note: we )



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org