[ 
https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488263#comment-16488263
 ] 

Hsin-Liang Huang commented on YARN-8326:
----------------------------------------

Here is more detail information from node manager log that compares between 
Hadoop 3.0 and 2.6.  They are both running on 4 node cluster with 3 data nodes 
with same machine power/cpu/memory and same type of job.   I picked only one 
node to compare the container cycle. 

*1. On 3.0.*  when I request 8 containers to run on 3 data nodes,  I picked the 
second node to examine the log:

this job used 2 containers in this node:

 

container *container_e04_1527109836290_0004_01_000002*  on application 
application_1527109836290_0004  (from container succeeded to Stopping container 
(from blue to red line) took about *4 seconds*)

 

152231 2018-05-23 15:04:45,541 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:startContainerInternal(1059)) - Start request for 
container_e04_1527109836290_0004_01_000002 by user hlhuang

152232 2018-05-23 15:04:45,657 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:startContainerInternal(1127)) - Creating a new 
application reference for app application_1527109836290_0004

152233 2018-05-23 15:04:45,658 INFO  application.ApplicationImpl 
(ApplicationImpl.java:handle(632)) - Application application_1527109836290_0004 
transitioned from NEW to INITING

152234 2018-05-23 15:04:45,658 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(446)) - Adding 
container_e04_1527109836290_0004_01_000002 to application 
application_1527109836290_0004

152235 2018-05-23 15:04:45,658 INFO  application.ApplicationImpl 
(ApplicationImpl.java:handle(632)) - Application application_1527109836290_0004 
transitioned from INITING to RUNNING

152236 2018-05-23 15:04:45,659 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1527109836290_0004_01_000002 transitioned from NEW to SCHEDULED

152237 2018-05-23 15:04:45,659 INFO  containermanager.AuxServices 
(AuxServices.java:handle(220)) - Got event CONTAINER_INIT for appId 
application_1527109836290_0004

152238 2018-05-23 15:04:45,659 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:initializeContainer(289)) - Initializing container 
container_e04_1527109836290_0004_01_000002

152239 2018-05-23 15:04:45,660 INFO  scheduler.ContainerScheduler 
(ContainerScheduler.java:startContainer(503)) - Starting container 
[container_e04_1527109836290_0004_01_000002]

152246 2018-05-23 15:04:45,965 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1527109836290_0004_01_000002 transitioned from SCHEDULED to 
RUNNING

152247 2018-05-23 15:04:45,965 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:onStartMonitoringContainer(941)) - Starting 
resource-monitoring for container_e04_1527109836290_0004_01_000002

{color:#205081}152250 2018-05-23 15:04:46,002 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:handleContainerExitCode(512)) - Container 
container_e04_1527109836290_0004_01_000002 succeeded{color}

 

152251 2018-05-23 15:04:46,003 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1527109836290_0004_01_000002 transitioned from RUNNING to 
EXITED_WITH_SUCCESS

152252 2018-05-23 15:04:46,003 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container 
container_e04_1527109836290_0004_01_000002

152254 2018-05-23 15:04:48,132 INFO  nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:deleteAsUser(794)) - Deleting absolute path : 
/hadoop/yarn/local/usercache/hlhuang/appcache/application_1527109836290_0004/container_e04_1527109836290_0004_01_000002

152256 2018-05-23 15:04:48,133 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1527109836290_0004_01_000002 transitioned from 
EXITED_WITH_SUCCESS to DONE

152258 2018-05-23 15:04:49,171 INFO  nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(682)) - 
Removed completed containers from NM context: 
[container_e04_1527109836290_0004_01_000002]

152260 2018-05-23 15:04:50,289 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(489)) - Removing 
container_e04_1527109836290_0004_01_000002 from application 
application_1527109836290_0004

{color:#d04437}152261 2018-05-23 15:04:50,290 INFO  
monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:onStopMonitoringContainer(932)) - Stopping 
resource-monitoring for container_e04_1527109836290_0004_01_000002{color}

152263 2018-05-23 15:04:50,290 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:stopContainer(295)) - Stopping container 
container_e04_1527109836290_0004_01_000002

152262 2018-05-23 15:04:50,290 INFO  containermanager.AuxServices 
(AuxServices.java:handle(220)) - Got event CONTAINER_STOP for appId 
application_1527109836290_0004

 

 

container *container_e04_1527109836290_0004_01_000006* on application 
application_1527109836290_0004   (from container succeeded to Stopping 
container (from blue to red line) took about 4 seconds)

 

152240 2018-05-23 15:04:45,876 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:startContainerInternal(1059)) - Start request for 
container_e04_1527109836290_0004_01_000006 by user hlhuang

152241 2018-05-23 15:04:45,879 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(446)) - Adding 
container_e04_1527109836290_0004_01_000006 to application 
application_1527109836290_0004

152242 2018-05-23 15:04:45,879 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1527109836290_0004_01_000006 transitioned from NEW to SCHEDULED

152243 2018-05-23 15:04:45,882 INFO  containermanager.AuxServices 
(AuxServices.java:handle(220)) - Got event CONTAINER_INIT for appId 
application_1527109836290_0004

152244 2018-05-23 15:04:45,883 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:initializeContainer(289)) - Initializing container 
container_e04_1527109836290_0004_01_000006

152245 2018-05-23 15:04:45,883 INFO  scheduler.ContainerScheduler 
(ContainerScheduler.java:startContainer(503)) - Starting container 
[container_e04_1527109836290_0004_01_000006]

152248 2018-05-23 15:04:45,985 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1527109836290_0004_01_000006 transitioned from SCHEDULED to 
RUNNING

152249 2018-05-23 15:04:45,985 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:onStartMonitoringContainer(941)) - Starting 
resource-monitoring for container_e04_1527109836290_0004_01_000006

{color:#205081}152253 2018-05-23 15:04:46,013 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:handleContainerExitCode(512)) - Container 
container_e04_1527109836290_0004_01_000006 succeeded{color}

 

152255 2018-05-23 15:04:48,133 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1527109836290_0004_01_000006 transitioned from RUNNING to 
EXITED_WITH_SUCCESS

152257 2018-05-23 15:04:48,133 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container 
container_e04_1527109836290_0004_01_000006

152259 2018-05-23 15:04:50,289 INFO  nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:deleteAsUser(794)) - Deleting absolute path : 
/hadoop/yarn/local/usercache/hlhuang/appcache/application_1527109836290_0004/container_e04_1527109836290_0004_01_000006

152264 2018-05-23 15:04:50,290 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1527109836290_0004_01_000006 transitioned from 
EXITED_WITH_SUCCESS to DONE

152265 2018-05-23 15:04:50,291 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(489)) - Removing 
container_e04_1527109836290_0004_01_000006 from application 
application_1527109836290_0004

152266 2018-05-23 15:04:50,291 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:onStopMonitoringContainer(932)) - Stopping 
resource-monitoring for container_e04_1527109836290_0004_01_000006

152267 2018-05-23 15:04:50,291 INFO  containermanager.AuxServices 
(AuxServices.java:handle(220)) - Got event CONTAINER_STOP for appId 
application_1527109836290_0004

{color:#d04437}152268 2018-05-23 15:04:50,291 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:stopContainer(295)) - Stopping container 
container_e04_1527109836290_0004_01_000006{color}

 

{color:#14892c}Below is something that probably needs to look at too.  You can 
see the previous line is 152269, and next line is 152270, and in between there 
is a 5 seconds gap!!!  i.e. after containers are removed till application 
transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP  took 5 
seconds!{color}

 

*152269 2018-05-23 15:04:51,301* INFO  nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(682)) - 
Removed completed containers from NM context: 
[container_e04_1527109836290_0004_01_000006]

*152270 2018-05-23 15:04:56,315* INFO  application.ApplicationImpl 
(ApplicationImpl.java:handle(632)) - Application application_1527109836290_0004 
transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP

152271 2018-05-23 15:04:56,316 INFO  nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:deleteAsUser(794)) - Deleting absolute path : 
/hadoop/yarn/local/usercache/hlhuang/appcache/application_1527109836290_0004

152272 2018-05-23 15:04:56,316 INFO  containermanager.AuxServices 
(AuxServices.java:handle(220)) - Got event APPLICATION_STOP for appId 
application_1527109836290_0004

152273 2018-05-23 15:04:56,316 INFO  shuffle.ExternalShuffleBlockResolver 
(ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
application_1527109836290_0004 removed, cleanupLocalDirs = false

152274 2018-05-23 15:04:56,320 INFO  impl.TimelineV2ClientImpl 
(TimelineV2ClientImpl.java:stop(542)) - Stopping TimelineClient.

 

 

*2.  On 2.6.*  when I request 8 containers to run on 3 data nodes,  I picked 
the second node to examine the log:

this job used 3 containers in this node:

 

Application application_1526961340393_0012,  container 
*container_e11_1526961340393_0012_01_000003* (from container succeeded to 
Stopping container (from blue to red line) took less than *1 seconds*)

 

991154 2018-05-23 15:41:26,904 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:startContainerInternal(810)) - Start request for 
container_e11_1526961340393_0012_01_000003 by user hlhuang

991155 2018-05-23 15:41:26,908 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:startContainerInternal(850)) - Creating a new 
application reference for app application_1526961340393_0012

991156 2018-05-23 15:41:26,908 INFO  application.ApplicationImpl 
(ApplicationImpl.java:handle(464)) - Application application_1526961340393_0012 
transitioned from NEW to INITING

991157 2018-05-23 15:41:26,909 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(304)) - Adding 
container_e11_1526961340393_0012_01_000003 to application 
application_1526961340393_0012

 

991158 2018-05-23 15:41:26,914 INFO  application.ApplicationImpl 
(ApplicationImpl.java:handle(464)) - Application application_1526961340393_0012 
transitioned from INITING to RUNNING

991159 2018-05-23 15:41:26,915 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000003 transitioned from NEW to LOCALIZED

991160 2018-05-23 15:41:26,915 INFO  containermanager.AuxServices 
(AuxServices.java:handle(215)) - Got event CONTAINER_INIT for appId 
application_1526961340393_0012

991161 2018-05-23 15:41:26,915 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:initializeContainer(192)) - Initializing container 
container_e11_1526961340393_0012_01_000003

991162 2018-05-23 15:41:26,915 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:initializeContainer(289)) - Initializing container 
container_e11_1526961340393_0012_01_000003

991163 2018-05-23 15:41:26,968 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000003 transitioned from LOCALIZED to 
RUNNING

991164 2018-05-23 15:41:26,968 INFO  runtime.DelegatingLinuxContainerRuntime 
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using 
container runtime: DefaultLinuxContainerRuntime

991165 2018-05-23 15:41:26,968 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor 
physical-memory monitoring is needed. Not running the monitor-thread

{color:#205081}991166 2018-05-23 15:41:26,993 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:call(384)) - Container 
container_e11_1526961340393_0012_01_000003 succeeded{color}

 

991167 2018-05-23 15:41:26,994 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000003 transitioned from RUNNING to 
EXITED_WITH_SUCCESS

991168 2018-05-23 15:41:26,994 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(541)) - Cleaning up container 
container_e11_1526961340393_0012_01_000003

991169 2018-05-23 15:41:26,995 INFO  runtime.DelegatingLinuxContainerRuntime 
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using 
container runtime: DefaultLinuxContainerRuntime

991170 2018-05-23 15:41:27,021 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000003 transitioned from 
EXITED_WITH_SUCCESS to DONE

991171 2018-05-23 15:41:27,021 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(347)) - Removing 
container_e11_1526961340393_0012_01_000003 from application 
application_1526961340393_0012

991172 2018-05-23 15:41:27,021 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor 
physical-memory monitoring is needed. Not running the monitor-thread

991173 2018-05-23 15:41:27,035 INFO  containermanager.AuxServices 
(AuxServices.java:handle(215)) - Got event CONTAINER_STOP for appId 
application_1526961340393_0012

991174 2018-05-23 15:41:27,035 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:stopContainer(198)) - Stopping container 
container_e11_1526961340393_0012_01_000003

{color:#d04437}991175 2018-05-23 15:41:27,036 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:stopContainer(295)) - Stopping container 
container_e11_1526961340393_0012_01_000003{color}

 

Application application_1526961340393_0012,  container 
*container_e11_1526961340393_0012_01_000005* (from container succeeded to 
Stopping container (from blue to red line) took less than *1 seconds*)

 

991176 2018-05-23 15:41:27,152 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:startContainerInternal(810)) - Start request for 
container_e11_1526961340393_0012_01_000005 by user hlhuang

991177 2018-05-23 15:41:27,154 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(304)) - Adding 
container_e11_1526961340393_0012_01_000005 to application 
application_1526961340393_0012

 

991178 2018-05-23 15:41:27,154 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000005 transitioned from NEW to LOCALIZED

991179 2018-05-23 15:41:27,155 INFO  containermanager.AuxServices 
(AuxServices.java:handle(215)) - Got event CONTAINER_INIT for appId 
application_1526961340393_0012

991180 2018-05-23 15:41:27,155 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:initializeContainer(192)) - Initializing container 
container_e11_1526961340393_0012_01_000005

991181 2018-05-23 15:41:27,155 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:initializeContainer(289)) - Initializing container 
container_e11_1526961340393_0012_01_000005

991182 2018-05-23 15:41:27,205 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000005 transitioned from LOCALIZED to 
RUNNING

991183 2018-05-23 15:41:27,205 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor 
physical-memory monitoring is needed. Not running the monitor-thread

991184 2018-05-23 15:41:27,205 INFO  runtime.DelegatingLinuxContainerRuntime 
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using 
container runtime: DefaultLinuxContainerRuntime

{color:#205081}991185 2018-05-23 15:41:27,227 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:call(384)) - Container 
container_e11_1526961340393_0012_01_000005 succeeded{color}

 

991186 2018-05-23 15:41:27,228 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000005 transitioned from RUNNING to 
EXITED_WITH_SUCCESS

991187 2018-05-23 15:41:27,228 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(541)) - Cleaning up container 
container_e11_1526961340393_0012_01_000005

991188 2018-05-23 15:41:27,228 INFO  runtime.DelegatingLinuxContainerRuntime 
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using 
container runtime: DefaultLinuxContainerRuntime

991189 2018-05-23 15:41:27,258 INFO  runtime.DelegatingLinuxContainerRuntime 
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using 
container runtime: DefaultLinuxContainerRuntime

991190 2018-05-23 15:41:27,268 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000005 transitioned from 
EXITED_WITH_SUCCESS to DONE

991191 2018-05-23 15:41:27,268 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(347)) - Removing 
container_e11_1526961340393_0012_01_000005 from application 
application_1526961340393_0012

991192 2018-05-23 15:41:27,268 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor 
physical-memory monitoring is needed. Not running the monitor-thread

991193 2018-05-23 15:41:27,268 INFO  containermanager.AuxServices 
(AuxServices.java:handle(215)) - Got event CONTAINER_STOP for appId 
application_1526961340393_0012

991194 2018-05-23 15:41:27,268 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:stopContainer(198)) - Stopping container 
container_e11_1526961340393_0012_01_000005

{color:#d04437}991195 2018-05-23 15:41:27,268 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:stopContainer(295)) - Stopping container 
container_e11_1526961340393_0012_01_000005{color}

 

Application application_1526961340393_0012,  *container 
container_e11_1526961340393_0012_01_000008* (from container succeeded to 
Stopping container (from blue to red line) took less than *1 seconds*)

991196 2018-05-23 15:41:27,340 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:startContainerInternal(810)) - Start request for 
container_e11_1526961340393_0012_01_000008 by user hlhuang

991197 2018-05-23 15:41:27,350 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(304)) - Adding 
container_e11_1526961340393_0012_01_000008 to application 
application_1526961340393_0012

991198 2018-05-23 15:41:27,351 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000008 transitioned from NEW to LOCALIZED

991199 2018-05-23 15:41:27,351 INFO  containermanager.AuxServices 
(AuxServices.java:handle(215)) - Got event CONTAINER_INIT for appId 
application_1526961340393_0012

991200 2018-05-23 15:41:27,351 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:initializeContainer(192)) - Initializing container 
container_e11_1526961340393_0012_01_000008

991201 2018-05-23 15:41:27,352 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:initializeContainer(289)) - Initializing container 
container_e11_1526961340393_0012_01_000008

991202 2018-05-23 15:41:27,397 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000008 transitioned from LOCALIZED to 
RUNNING

991203 2018-05-23 15:41:27,397 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor 
physical-memory monitoring is needed. Not running the monitor-thread

991204 2018-05-23 15:41:27,397 INFO  runtime.DelegatingLinuxContainerRuntime 
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using 
container runtime: DefaultLinuxContainerRuntime

{color:#205081}991205 2018-05-23 15:41:27,440 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:call(384)) - Container 
container_e11_1526961340393_0012_01_000008 succeeded{color}

 

991206 2018-05-23 15:41:27,440 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000008 transitioned from RUNNING to 
EXITED_WITH_SUCCESS

991207 2018-05-23 15:41:27,440 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(541)) - Cleaning up container 
container_e11_1526961340393_0012_01_000008

991208 2018-05-23 15:41:27,441 INFO  runtime.DelegatingLinuxContainerRuntime 
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using 
container runtime: DefaultLinuxContainerRuntime

991209 2018-05-23 15:41:27,471 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1163)) - Container 
container_e11_1526961340393_0012_01_000008 transitioned from 
EXITED_WITH_SUCCESS to DONE

991210 2018-05-23 15:41:27,471 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(347)) - Removing 
container_e11_1526961340393_0012_01_000008 from application 
application_1526961340393_0012

991211 2018-05-23 15:41:27,471 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor 
physical-memory monitoring is needed. Not running the monitor-thread

991212 2018-05-23 15:41:27,471 INFO  containermanager.AuxServices 
(AuxServices.java:handle(215)) - Got event CONTAINER_STOP for appId 
application_1526961340393_0012

991213 2018-05-23 15:41:27,471 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:stopContainer(198)) - Stopping container 
container_e11_1526961340393_0012_01_000008

{color:#d04437}991214 2018-05-23 15:41:27,471 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:stopContainer(295)) - Stopping container 
container_e11_1526961340393_0012_01_000008{color}

{color:#d04437} {color}

 

{color:#14892c}Now compare with 3.0 above,  from Removed completed containers 
(line 991215) to transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP 
(line 991218), it only took 1 seconds for 2.6.{color}

{color:#14892c} {color}

 

*991215 2018-05-23 15:41:27,472* INFO  nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(553)) - 
Removed completed containers from NM context: 
[container_e11_1526961340393_0012_01_000005, 
container_e11_1526961340393_0012_01_000003]

991216 2018-05-23 15:41:27,485 INFO  runtime.DelegatingLinuxContainerRuntime 
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using 
container runtime: DefaultLinuxContainerRuntime

991217 2018-05-23 15:41:27,699 INFO  runtime.DelegatingLinuxContainerRuntime 
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using 
container runtime: DefaultLinuxContainerRuntime

*991218 2018-05-23 15:41:28,473* INFO  application.ApplicationImpl 
(ApplicationImpl.java:handle(464)) - Application application_1526961340393_0012 
transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP

991219 2018-05-23 15:41:28,474 INFO  containermanager.AuxServices 
(AuxServices.java:handle(215)) - Got event APPLICATION_STOP for appId 
application_1526961340393_0012

991220 2018-05-23 15:41:28,474 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:stopApplication(179)) - Stopping application 
application_1526961340393_0012

991221 2018-05-23 15:41:28,475 INFO  shuffle.ExternalShuffleBlockResolver 
(ExternalShuffleBlockResolver.java:applicationRemoved(206)) - Application 
application_1526961340393_0012 removed, cleanupLocalDirs = false

991222 2018-05-23 15:41:28,475 INFO  shuffle.ExternalShuffleBlockResolver 
(ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
application_1526961340393_0012 removed, cleanupLocalDirs = false

991223 2018-05-23 15:41:28,475 INFO  application.ApplicationImpl 
(ApplicationImpl.java:handle(464)) - Application application_1526961340393_0012 
transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED

991224 2018-05-23 15:41:28,475 INFO  loghandler.NonAggregatingLogHandler 
(NonAggregatingLogHandler.java:handle(167)) - Scheduling Log Deletion for 
application: application_1526961340393_0012, with delay of 604800 seconds

  

Above logs are from changing the yarn.nodemanager.container-monitor.interval-ms 
to 300 only.  I removed the monitoring to false property.

 

To summarize,   I believe there are two places that need to be investigated:

1. on each container, between container succeeded to stopping container, it 
takes 4 seconds for 3.0, and only 1 second for 2.6

2. from 'Removed completed containers' to 'application transitioned from 
RUNNING to APPLICATION_RESOURCES_CLEANINGUP', 3.0 takes a wopping 5 seconds and 
only 1 second for 2.6.

 

 

> Yarn 3.0 seems runs slower than Yarn 2.6
> ----------------------------------------
>
>                 Key: YARN-8326
>                 URL: https://issues.apache.org/jira/browse/YARN-8326
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 3.0.0
>         Environment: This is the yarn-site.xml for 3.0. 
>  
> <configuration>
> <property>
>  <name>hadoop.registry.dns.bind-port</name>
>  <value>5353</value>
>  </property>
> <property>
>  <name>hadoop.registry.dns.domain-name</name>
>  <value>hwx.site</value>
>  </property>
> <property>
>  <name>hadoop.registry.dns.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>hadoop.registry.dns.zone-mask</name>
>  <value>255.255.255.0</value>
>  </property>
> <property>
>  <name>hadoop.registry.dns.zone-subnet</name>
>  <value>172.17.0.0</value>
>  </property>
> <property>
>  <name>manage.include.files</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.acl.enable</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.admin.acl</name>
>  <value>yarn</value>
>  </property>
> <property>
>  <name>yarn.client.nodemanager-connect.max-wait-ms</name>
>  <value>60000</value>
>  </property>
> <property>
>  <name>yarn.client.nodemanager-connect.retry-interval-ms</name>
>  <value>10000</value>
>  </property>
> <property>
>  <name>yarn.http.policy</name>
>  <value>HTTP_ONLY</value>
>  </property>
> <property>
>  <name>yarn.log-aggregation-enable</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.log-aggregation.retain-seconds</name>
>  <value>2592000</value>
>  </property>
> <property>
>  <name>yarn.log.server.url</name>
>  
> <value>[http://xxxxxx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs]</value>
>  </property>
> <property>
>  <name>yarn.log.server.web-service.url</name>
>  
> <value>[http://xxxxxx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory]</value>
>  </property>
> <property>
>  <name>yarn.node-labels.enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.node-labels.fs-store.retry-policy-spec</name>
>  <value>2000, 500</value>
>  </property>
> <property>
>  <name>yarn.node-labels.fs-store.root-dir</name>
>  <value>/system/yarn/node-labels</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.address</name>
>  <value>0.0.0.0:45454</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.admin-env</name>
>  <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services</name>
>  <value>mapreduce_shuffle,spark2_shuffle,timeline_collector</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
>  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services.spark2_shuffle.class</name>
>  <value>org.apache.spark.network.yarn.YarnShuffleService</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
>  <value>/usr/spark2/aux/*</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
>  <value>org.apache.spark.network.yarn.YarnShuffleService</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services.timeline_collector.class</name>
>  
> <value>org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.bind-host</name>
>  <value>0.0.0.0</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.container-executor.class</name>
>  
> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.container-metrics.unregister-delay-ms</name>
>  <value>60000</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.container-monitor.interval-ms</name>
>  <value>3000</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.delete.debug-delay-sec</name>
>  <value>0</value>
>  </property>
> <property>
>  
> <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
>  <value>90</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name>
>  <value>1000</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
>  <value>0.25</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.health-checker.interval-ms</name>
>  <value>135000</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.health-checker.script.timeout-ms</name>
>  <value>60000</value>
>  </property>
> <property>
>  
> <name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.linux-container-executor.group</name>
>  <value>hadoop</value>
>  </property>
> <property>
>  
> <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.local-dirs</name>
>  <value>/hadoop/yarn/local</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log-aggregation.compression-type</name>
>  <value>gz</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log-aggregation.debug-enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log-aggregation.num-log-files-per-app</name>
>  <value>30</value>
>  </property>
> <property>
>  
> <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
>  <value>3600</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log-dirs</name>
>  <value>/hadoop/yarn/log</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log.retain-seconds</name>
>  <value>604800</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.pmem-check-enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.recovery.dir</name>
>  <value>/var/log/hadoop-yarn/nodemanager/recovery-state</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.recovery.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.recovery.supervised</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.remote-app-log-dir</name>
>  <value>/app-logs</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
>  <value>logs</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource-plugins</name>
>  <value></value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
>  <value>auto</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource-plugins.gpu.docker-plugin</name>
>  <value>nvidia-docker-v1</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidiadocker-
>  v1.endpoint</name>
>  <value>[http://localhost:3476/v1.0/docker/cli]</value>
>  </property>
> <property>
>  
> <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
>  <value></value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource.cpu-vcores</name>
>  <value>6</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource.memory-mb</name>
>  <value>12288</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name>
>  <value>80</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.allowed-runtimes</name>
>  <value>default,docker</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name>
>  <value>host,none,bridge</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.docker.capabilities</name>
>  <value>
>  CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,
>  SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.docker.default-container-network</name>
>  <value>host</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.docker.privileged-containers.acl</name>
>  <value></value>
>  </property>
> <property>
>  
> <name>yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.vmem-check-enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.vmem-pmem-ratio</name>
>  <value>2.1</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.webapp.cross-origin.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.address</name>
>  <value>xxxxxxx:8050</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.admin.address</name>
>  <value>xxxxxx:8141</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.am.max-attempts</name>
>  <value>2</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.bind-host</name>
>  <value>0.0.0.0</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.connect.max-wait.ms</name>
>  <value>900000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.connect.retry-interval.ms</name>
>  <value>30000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.fs.state-store.retry-policy-spec</name>
>  <value>2000, 500</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.fs.state-store.uri</name>
>  <value> </value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.ha.enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.hostname</name>
>  <value>xxxxxxxx</value>
>  </property>
> <property>
>  
> <name>yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval</name>
>  <value>15000</value>
>  </property>
> <property>
>  
> <name>yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor</name>
>  <value>1</value>
>  </property>
> <property>
>  
> <name>yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round</name>
>  <value>0.25</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.nodes.exclude-path</name>
>  <value>/etc/hadoop/conf/yarn.exclude</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.recovery.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.resource-tracker.address</name>
>  <value>xxxxxxx:8025</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.scheduler.address</name>
>  <value>xxxxxxxx:8030</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.scheduler.class</name>
>  
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.scheduler.monitor.enable</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.state-store.max-completed-applications</name>
>  <value>${yarn.resourcemanager.max-completed-applications}</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.store.class</name>
>  
> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
>  </property>
> <property>
>  
> <name>yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size</name>
>  <value>10</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.webapp.address</name>
>  <value>xxxxxx:8088</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.webapp.cross-origin.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.webapp.https.address</name>
>  <value>wxxxxxx:8090</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name>
>  <value>10000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-acl</name>
>  <value>world:anyone:rwcda</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-address</name>
>  <value>xxxxxx:2181,xxxxxx:2181,xxxxxx:2181</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-num-retries</name>
>  <value>1000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-retry-interval-ms</name>
>  <value>1000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-state-store.parent-path</name>
>  <value>/rmstore</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-timeout-ms</name>
>  <value>10000</value>
>  </property>
> <property>
>  <name>yarn.rm.system-metricspublisher.emit-container-events</name>
>  <value>true</value>
>  </property>
> <property>
>  
> <name>yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.scheduler.maximum-allocation-mb</name>
>  <value>12288</value>
>  </property>
> <property>
>  <name>yarn.scheduler.maximum-allocation-vcores</name>
>  <value>6</value>
>  </property>
> <property>
>  <name>yarn.scheduler.minimum-allocation-mb</name>
>  <value>64</value>
>  </property>
> <property>
>  <name>yarn.scheduler.minimum-allocation-vcores</name>
>  <value>1</value>
>  </property>
> <property>
>  <name>yarn.service.framework.path</name>
>  <value>/yarn/service-dep.tar.gz</value>
>  </property>
> <property>
>  <name>yarn.system-metricspublisher.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.address</name>
>  <value>xxxxxx:10200</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.bind-host</name>
>  <value>0.0.0.0</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.client.max-retries</name>
>  <value>30</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.client.retry-interval-ms</name>
>  <value>1000</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.active-dir</name>
>  <value>/ats/active/</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.app-cache-size</name>
>  <value>10</value>
>  </property>
> <property>
>  
> <name>yarn.timeline-service.entity-group-fs-store.cleaner-interval-seconds</name>
>  <value>3600</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.done-dir</name>
>  <value>/ats/done/</value>
>  </property>
> <property>
>  
> <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes</name>
>  <value></value>
>  </property>
> <property>
>  
> <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
>  <value></value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.retain-seconds</name>
>  <value>604800</value>
>  </property>
> <property>
>  
> <name>yarn.timeline-service.entity-group-fs-store.scan-interval-seconds</name>
>  <value>60</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.summary-store</name>
>  
> <value>org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.generic-application-history.store-class</name>
>  
> <value>org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.hbase-schema.prefix</name>
>  <value>prod.</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.hbase.configuration.file</name>
>  <value>[file:///etc/yarn-hbase/conf/hbase-site.xml]</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.hbase.coprocessor.jar.hdfs.location</name>
>  
> <value>[file:///hadoop-yarn-client/timelineservice/hadoop-yarn-server-timelineservice-hbase-coprocessor.jar|file:///usr/hdp/current/hadoop-yarn-client/timelineservice/hadoop-yarn-server-timelineservice-hbase-coprocessor.jar]</value>
>  </property>
> <property>
>  
> <name>yarn.timeline-service.http-authentication.simple.anonymous.allowed</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.http-authentication.type</name>
>  <value>simple</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.http-cross-origin.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.leveldb-state-store.path</name>
>  <value>/hadoop/yarn/timeline</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.leveldb-timeline-store.path</name>
>  <value>/hadoop/yarn/timeline</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name>
>  <value>104857600</value>
>  </property>
> <property>
>  
> <name>yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size</name>
>  <value>10000</value>
>  </property>
> <property>
>  
> <name>yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size</name>
>  <value>10000</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name>
>  <value>300000</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.reader.webapp.address</name>
>  <value>xxxxxx:8198</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.reader.webapp.https.address</name>
>  <value>xxxxxx:8199</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.recovery.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.state-store-class</name>
>  
> <value>org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.store-class</name>
>  
> <value>org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.ttl-enable</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.ttl-ms</name>
>  <value>2678400000</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.version</name>
>  <value>2.0</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.versions</name>
>  <value>1.5f,2.0f</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.webapp.address</name>
>  <value>xxxxxx:8188</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.webapp.https.address</name>
>  <value>xxxxxx:8190</value>
>  </property>
> <property>
>  <name>yarn.webapp.api-service.enable</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.webapp.ui2.enable</name>
>  <value>true</value>
>  </property>
> </configuration>
>            Reporter: Hsin-Liang Huang
>            Priority: Major
>         Attachments: image-2018-05-18-15-20-33-839.png, 
> image-2018-05-18-15-22-30-948.png
>
>
> Hi,  I am running testcases on Yarn 2.6 and Yarn 3.0 and found out the 
> performance seems like twice slower on Yarn 3.0, and the performance would 
> get even slower if we acquire more containers.   I looked at the node manager 
> logs on 2.6 vs 3.0.   Here is what I find below.  
> On 2.6 ,  this is a life cycle of a specific container,  from beginning to 
> end, it takes about{color:#14892c}  8 seconds{color} (9:53:50 to 9:53:58). 
> !image-2018-05-18-15-20-33-839.png!
> On 3.0: the life cycle of a specific container looks like this,  it 
> takes{color:#d04437} 20 seconds{color} to finish the same job.  (9:51:44 to 
> 9:52:04)
> !image-2018-05-18-15-22-30-948.png!
>  It seems like on 3.0, it spends an extra 5 seconds on 
> monitor.ContinaerMonitorImpl  (marked in {color:#d04437}red{color}) which 
> doesn't happen in 2.6,  and also after the job is done, and the container is 
> exiting,  on 3.0, it took 5 seconds to do that (9:51:59 to 9:52:04)  which on 
> 2.6, it only took less than 1/.2 of the time. (9: 53:56 to 9:53:58).  
>    Since we are running the same unit testcases and usually acquire more than 
> 4 containers,  therefore, when it addess up all these extra seconds, it 
> became a huge performance issue.  On 2.6, the unittest runs 7 hours whilc on 
> 3.0, the same unitests runs 11 hours.  I was told this performance delay 
> might be caused by Hadoop’s new monitoring system Timeline service v2.  Could 
> someone take a look of this?   Thanks for any help on this!!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to