[
https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488263#comment-16488263
]
Hsin-Liang Huang commented on YARN-8326:
----------------------------------------
Here is more detail information from node manager log that compares between
Hadoop 3.0 and 2.6. They are both running on 4 node cluster with 3 data nodes
with same machine power/cpu/memory and same type of job. I picked only one
node to compare the container cycle.
*1. On 3.0.* when I request 8 containers to run on 3 data nodes, I picked the
second node to examine the log:
this job used 2 containers in this node:
container *container_e04_1527109836290_0004_01_000002* on application
application_1527109836290_0004 (from container succeeded to Stopping container
(from blue to red line) took about *4 seconds*)
152231 2018-05-23 15:04:45,541 INFO containermanager.ContainerManagerImpl
(ContainerManagerImpl.java:startContainerInternal(1059)) - Start request for
container_e04_1527109836290_0004_01_000002 by user hlhuang
152232 2018-05-23 15:04:45,657 INFO containermanager.ContainerManagerImpl
(ContainerManagerImpl.java:startContainerInternal(1127)) - Creating a new
application reference for app application_1527109836290_0004
152233 2018-05-23 15:04:45,658 INFO application.ApplicationImpl
(ApplicationImpl.java:handle(632)) - Application application_1527109836290_0004
transitioned from NEW to INITING
152234 2018-05-23 15:04:45,658 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(446)) - Adding
container_e04_1527109836290_0004_01_000002 to application
application_1527109836290_0004
152235 2018-05-23 15:04:45,658 INFO application.ApplicationImpl
(ApplicationImpl.java:handle(632)) - Application application_1527109836290_0004
transitioned from INITING to RUNNING
152236 2018-05-23 15:04:45,659 INFO container.ContainerImpl
(ContainerImpl.java:handle(2108)) - Container
container_e04_1527109836290_0004_01_000002 transitioned from NEW to SCHEDULED
152237 2018-05-23 15:04:45,659 INFO containermanager.AuxServices
(AuxServices.java:handle(220)) - Got event CONTAINER_INIT for appId
application_1527109836290_0004
152238 2018-05-23 15:04:45,659 INFO yarn.YarnShuffleService
(YarnShuffleService.java:initializeContainer(289)) - Initializing container
container_e04_1527109836290_0004_01_000002
152239 2018-05-23 15:04:45,660 INFO scheduler.ContainerScheduler
(ContainerScheduler.java:startContainer(503)) - Starting container
[container_e04_1527109836290_0004_01_000002]
152246 2018-05-23 15:04:45,965 INFO container.ContainerImpl
(ContainerImpl.java:handle(2108)) - Container
container_e04_1527109836290_0004_01_000002 transitioned from SCHEDULED to
RUNNING
152247 2018-05-23 15:04:45,965 INFO monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:onStartMonitoringContainer(941)) - Starting
resource-monitoring for container_e04_1527109836290_0004_01_000002
{color:#205081}152250 2018-05-23 15:04:46,002 INFO launcher.ContainerLaunch
(ContainerLaunch.java:handleContainerExitCode(512)) - Container
container_e04_1527109836290_0004_01_000002 succeeded{color}
152251 2018-05-23 15:04:46,003 INFO container.ContainerImpl
(ContainerImpl.java:handle(2108)) - Container
container_e04_1527109836290_0004_01_000002 transitioned from RUNNING to
EXITED_WITH_SUCCESS
152252 2018-05-23 15:04:46,003 INFO launcher.ContainerLaunch
(ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container
container_e04_1527109836290_0004_01_000002
152254 2018-05-23 15:04:48,132 INFO nodemanager.LinuxContainerExecutor
(LinuxContainerExecutor.java:deleteAsUser(794)) - Deleting absolute path :
/hadoop/yarn/local/usercache/hlhuang/appcache/application_1527109836290_0004/container_e04_1527109836290_0004_01_000002
152256 2018-05-23 15:04:48,133 INFO container.ContainerImpl
(ContainerImpl.java:handle(2108)) - Container
container_e04_1527109836290_0004_01_000002 transitioned from
EXITED_WITH_SUCCESS to DONE
152258 2018-05-23 15:04:49,171 INFO nodemanager.NodeStatusUpdaterImpl
(NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(682)) -
Removed completed containers from NM context:
[container_e04_1527109836290_0004_01_000002]
152260 2018-05-23 15:04:50,289 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(489)) - Removing
container_e04_1527109836290_0004_01_000002 from application
application_1527109836290_0004
{color:#d04437}152261 2018-05-23 15:04:50,290 INFO
monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:onStopMonitoringContainer(932)) - Stopping
resource-monitoring for container_e04_1527109836290_0004_01_000002{color}
152263 2018-05-23 15:04:50,290 INFO yarn.YarnShuffleService
(YarnShuffleService.java:stopContainer(295)) - Stopping container
container_e04_1527109836290_0004_01_000002
152262 2018-05-23 15:04:50,290 INFO containermanager.AuxServices
(AuxServices.java:handle(220)) - Got event CONTAINER_STOP for appId
application_1527109836290_0004
container *container_e04_1527109836290_0004_01_000006* on application
application_1527109836290_0004 (from container succeeded to Stopping
container (from blue to red line) took about 4 seconds)
152240 2018-05-23 15:04:45,876 INFO containermanager.ContainerManagerImpl
(ContainerManagerImpl.java:startContainerInternal(1059)) - Start request for
container_e04_1527109836290_0004_01_000006 by user hlhuang
152241 2018-05-23 15:04:45,879 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(446)) - Adding
container_e04_1527109836290_0004_01_000006 to application
application_1527109836290_0004
152242 2018-05-23 15:04:45,879 INFO container.ContainerImpl
(ContainerImpl.java:handle(2108)) - Container
container_e04_1527109836290_0004_01_000006 transitioned from NEW to SCHEDULED
152243 2018-05-23 15:04:45,882 INFO containermanager.AuxServices
(AuxServices.java:handle(220)) - Got event CONTAINER_INIT for appId
application_1527109836290_0004
152244 2018-05-23 15:04:45,883 INFO yarn.YarnShuffleService
(YarnShuffleService.java:initializeContainer(289)) - Initializing container
container_e04_1527109836290_0004_01_000006
152245 2018-05-23 15:04:45,883 INFO scheduler.ContainerScheduler
(ContainerScheduler.java:startContainer(503)) - Starting container
[container_e04_1527109836290_0004_01_000006]
152248 2018-05-23 15:04:45,985 INFO container.ContainerImpl
(ContainerImpl.java:handle(2108)) - Container
container_e04_1527109836290_0004_01_000006 transitioned from SCHEDULED to
RUNNING
152249 2018-05-23 15:04:45,985 INFO monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:onStartMonitoringContainer(941)) - Starting
resource-monitoring for container_e04_1527109836290_0004_01_000006
{color:#205081}152253 2018-05-23 15:04:46,013 INFO launcher.ContainerLaunch
(ContainerLaunch.java:handleContainerExitCode(512)) - Container
container_e04_1527109836290_0004_01_000006 succeeded{color}
152255 2018-05-23 15:04:48,133 INFO container.ContainerImpl
(ContainerImpl.java:handle(2108)) - Container
container_e04_1527109836290_0004_01_000006 transitioned from RUNNING to
EXITED_WITH_SUCCESS
152257 2018-05-23 15:04:48,133 INFO launcher.ContainerLaunch
(ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container
container_e04_1527109836290_0004_01_000006
152259 2018-05-23 15:04:50,289 INFO nodemanager.LinuxContainerExecutor
(LinuxContainerExecutor.java:deleteAsUser(794)) - Deleting absolute path :
/hadoop/yarn/local/usercache/hlhuang/appcache/application_1527109836290_0004/container_e04_1527109836290_0004_01_000006
152264 2018-05-23 15:04:50,290 INFO container.ContainerImpl
(ContainerImpl.java:handle(2108)) - Container
container_e04_1527109836290_0004_01_000006 transitioned from
EXITED_WITH_SUCCESS to DONE
152265 2018-05-23 15:04:50,291 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(489)) - Removing
container_e04_1527109836290_0004_01_000006 from application
application_1527109836290_0004
152266 2018-05-23 15:04:50,291 INFO monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:onStopMonitoringContainer(932)) - Stopping
resource-monitoring for container_e04_1527109836290_0004_01_000006
152267 2018-05-23 15:04:50,291 INFO containermanager.AuxServices
(AuxServices.java:handle(220)) - Got event CONTAINER_STOP for appId
application_1527109836290_0004
{color:#d04437}152268 2018-05-23 15:04:50,291 INFO yarn.YarnShuffleService
(YarnShuffleService.java:stopContainer(295)) - Stopping container
container_e04_1527109836290_0004_01_000006{color}
{color:#14892c}Below is something that probably needs to look at too. You can
see the previous line is 152269, and next line is 152270, and in between there
is a 5 seconds gap!!! i.e. after containers are removed till application
transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP took 5
seconds!{color}
*152269 2018-05-23 15:04:51,301* INFO nodemanager.NodeStatusUpdaterImpl
(NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(682)) -
Removed completed containers from NM context:
[container_e04_1527109836290_0004_01_000006]
*152270 2018-05-23 15:04:56,315* INFO application.ApplicationImpl
(ApplicationImpl.java:handle(632)) - Application application_1527109836290_0004
transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
152271 2018-05-23 15:04:56,316 INFO nodemanager.LinuxContainerExecutor
(LinuxContainerExecutor.java:deleteAsUser(794)) - Deleting absolute path :
/hadoop/yarn/local/usercache/hlhuang/appcache/application_1527109836290_0004
152272 2018-05-23 15:04:56,316 INFO containermanager.AuxServices
(AuxServices.java:handle(220)) - Got event APPLICATION_STOP for appId
application_1527109836290_0004
152273 2018-05-23 15:04:56,316 INFO shuffle.ExternalShuffleBlockResolver
(ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application
application_1527109836290_0004 removed, cleanupLocalDirs = false
152274 2018-05-23 15:04:56,320 INFO impl.TimelineV2ClientImpl
(TimelineV2ClientImpl.java:stop(542)) - Stopping TimelineClient.
*2. On 2.6.* when I request 8 containers to run on 3 data nodes, I picked
the second node to examine the log:
this job used 3 containers in this node:
Application application_1526961340393_0012, container
*container_e11_1526961340393_0012_01_000003* (from container succeeded to
Stopping container (from blue to red line) took less than *1 seconds*)
991154 2018-05-23 15:41:26,904 INFO containermanager.ContainerManagerImpl
(ContainerManagerImpl.java:startContainerInternal(810)) - Start request for
container_e11_1526961340393_0012_01_000003 by user hlhuang
991155 2018-05-23 15:41:26,908 INFO containermanager.ContainerManagerImpl
(ContainerManagerImpl.java:startContainerInternal(850)) - Creating a new
application reference for app application_1526961340393_0012
991156 2018-05-23 15:41:26,908 INFO application.ApplicationImpl
(ApplicationImpl.java:handle(464)) - Application application_1526961340393_0012
transitioned from NEW to INITING
991157 2018-05-23 15:41:26,909 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(304)) - Adding
container_e11_1526961340393_0012_01_000003 to application
application_1526961340393_0012
991158 2018-05-23 15:41:26,914 INFO application.ApplicationImpl
(ApplicationImpl.java:handle(464)) - Application application_1526961340393_0012
transitioned from INITING to RUNNING
991159 2018-05-23 15:41:26,915 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000003 transitioned from NEW to LOCALIZED
991160 2018-05-23 15:41:26,915 INFO containermanager.AuxServices
(AuxServices.java:handle(215)) - Got event CONTAINER_INIT for appId
application_1526961340393_0012
991161 2018-05-23 15:41:26,915 INFO yarn.YarnShuffleService
(YarnShuffleService.java:initializeContainer(192)) - Initializing container
container_e11_1526961340393_0012_01_000003
991162 2018-05-23 15:41:26,915 INFO yarn.YarnShuffleService
(YarnShuffleService.java:initializeContainer(289)) - Initializing container
container_e11_1526961340393_0012_01_000003
991163 2018-05-23 15:41:26,968 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000003 transitioned from LOCALIZED to
RUNNING
991164 2018-05-23 15:41:26,968 INFO runtime.DelegatingLinuxContainerRuntime
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using
container runtime: DefaultLinuxContainerRuntime
991165 2018-05-23 15:41:26,968 INFO monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor
physical-memory monitoring is needed. Not running the monitor-thread
{color:#205081}991166 2018-05-23 15:41:26,993 INFO launcher.ContainerLaunch
(ContainerLaunch.java:call(384)) - Container
container_e11_1526961340393_0012_01_000003 succeeded{color}
991167 2018-05-23 15:41:26,994 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000003 transitioned from RUNNING to
EXITED_WITH_SUCCESS
991168 2018-05-23 15:41:26,994 INFO launcher.ContainerLaunch
(ContainerLaunch.java:cleanupContainer(541)) - Cleaning up container
container_e11_1526961340393_0012_01_000003
991169 2018-05-23 15:41:26,995 INFO runtime.DelegatingLinuxContainerRuntime
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using
container runtime: DefaultLinuxContainerRuntime
991170 2018-05-23 15:41:27,021 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000003 transitioned from
EXITED_WITH_SUCCESS to DONE
991171 2018-05-23 15:41:27,021 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(347)) - Removing
container_e11_1526961340393_0012_01_000003 from application
application_1526961340393_0012
991172 2018-05-23 15:41:27,021 INFO monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor
physical-memory monitoring is needed. Not running the monitor-thread
991173 2018-05-23 15:41:27,035 INFO containermanager.AuxServices
(AuxServices.java:handle(215)) - Got event CONTAINER_STOP for appId
application_1526961340393_0012
991174 2018-05-23 15:41:27,035 INFO yarn.YarnShuffleService
(YarnShuffleService.java:stopContainer(198)) - Stopping container
container_e11_1526961340393_0012_01_000003
{color:#d04437}991175 2018-05-23 15:41:27,036 INFO yarn.YarnShuffleService
(YarnShuffleService.java:stopContainer(295)) - Stopping container
container_e11_1526961340393_0012_01_000003{color}
Application application_1526961340393_0012, container
*container_e11_1526961340393_0012_01_000005* (from container succeeded to
Stopping container (from blue to red line) took less than *1 seconds*)
991176 2018-05-23 15:41:27,152 INFO containermanager.ContainerManagerImpl
(ContainerManagerImpl.java:startContainerInternal(810)) - Start request for
container_e11_1526961340393_0012_01_000005 by user hlhuang
991177 2018-05-23 15:41:27,154 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(304)) - Adding
container_e11_1526961340393_0012_01_000005 to application
application_1526961340393_0012
991178 2018-05-23 15:41:27,154 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000005 transitioned from NEW to LOCALIZED
991179 2018-05-23 15:41:27,155 INFO containermanager.AuxServices
(AuxServices.java:handle(215)) - Got event CONTAINER_INIT for appId
application_1526961340393_0012
991180 2018-05-23 15:41:27,155 INFO yarn.YarnShuffleService
(YarnShuffleService.java:initializeContainer(192)) - Initializing container
container_e11_1526961340393_0012_01_000005
991181 2018-05-23 15:41:27,155 INFO yarn.YarnShuffleService
(YarnShuffleService.java:initializeContainer(289)) - Initializing container
container_e11_1526961340393_0012_01_000005
991182 2018-05-23 15:41:27,205 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000005 transitioned from LOCALIZED to
RUNNING
991183 2018-05-23 15:41:27,205 INFO monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor
physical-memory monitoring is needed. Not running the monitor-thread
991184 2018-05-23 15:41:27,205 INFO runtime.DelegatingLinuxContainerRuntime
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using
container runtime: DefaultLinuxContainerRuntime
{color:#205081}991185 2018-05-23 15:41:27,227 INFO launcher.ContainerLaunch
(ContainerLaunch.java:call(384)) - Container
container_e11_1526961340393_0012_01_000005 succeeded{color}
991186 2018-05-23 15:41:27,228 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000005 transitioned from RUNNING to
EXITED_WITH_SUCCESS
991187 2018-05-23 15:41:27,228 INFO launcher.ContainerLaunch
(ContainerLaunch.java:cleanupContainer(541)) - Cleaning up container
container_e11_1526961340393_0012_01_000005
991188 2018-05-23 15:41:27,228 INFO runtime.DelegatingLinuxContainerRuntime
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using
container runtime: DefaultLinuxContainerRuntime
991189 2018-05-23 15:41:27,258 INFO runtime.DelegatingLinuxContainerRuntime
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using
container runtime: DefaultLinuxContainerRuntime
991190 2018-05-23 15:41:27,268 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000005 transitioned from
EXITED_WITH_SUCCESS to DONE
991191 2018-05-23 15:41:27,268 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(347)) - Removing
container_e11_1526961340393_0012_01_000005 from application
application_1526961340393_0012
991192 2018-05-23 15:41:27,268 INFO monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor
physical-memory monitoring is needed. Not running the monitor-thread
991193 2018-05-23 15:41:27,268 INFO containermanager.AuxServices
(AuxServices.java:handle(215)) - Got event CONTAINER_STOP for appId
application_1526961340393_0012
991194 2018-05-23 15:41:27,268 INFO yarn.YarnShuffleService
(YarnShuffleService.java:stopContainer(198)) - Stopping container
container_e11_1526961340393_0012_01_000005
{color:#d04437}991195 2018-05-23 15:41:27,268 INFO yarn.YarnShuffleService
(YarnShuffleService.java:stopContainer(295)) - Stopping container
container_e11_1526961340393_0012_01_000005{color}
Application application_1526961340393_0012, *container
container_e11_1526961340393_0012_01_000008* (from container succeeded to
Stopping container (from blue to red line) took less than *1 seconds*)
991196 2018-05-23 15:41:27,340 INFO containermanager.ContainerManagerImpl
(ContainerManagerImpl.java:startContainerInternal(810)) - Start request for
container_e11_1526961340393_0012_01_000008 by user hlhuang
991197 2018-05-23 15:41:27,350 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(304)) - Adding
container_e11_1526961340393_0012_01_000008 to application
application_1526961340393_0012
991198 2018-05-23 15:41:27,351 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000008 transitioned from NEW to LOCALIZED
991199 2018-05-23 15:41:27,351 INFO containermanager.AuxServices
(AuxServices.java:handle(215)) - Got event CONTAINER_INIT for appId
application_1526961340393_0012
991200 2018-05-23 15:41:27,351 INFO yarn.YarnShuffleService
(YarnShuffleService.java:initializeContainer(192)) - Initializing container
container_e11_1526961340393_0012_01_000008
991201 2018-05-23 15:41:27,352 INFO yarn.YarnShuffleService
(YarnShuffleService.java:initializeContainer(289)) - Initializing container
container_e11_1526961340393_0012_01_000008
991202 2018-05-23 15:41:27,397 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000008 transitioned from LOCALIZED to
RUNNING
991203 2018-05-23 15:41:27,397 INFO monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor
physical-memory monitoring is needed. Not running the monitor-thread
991204 2018-05-23 15:41:27,397 INFO runtime.DelegatingLinuxContainerRuntime
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using
container runtime: DefaultLinuxContainerRuntime
{color:#205081}991205 2018-05-23 15:41:27,440 INFO launcher.ContainerLaunch
(ContainerLaunch.java:call(384)) - Container
container_e11_1526961340393_0012_01_000008 succeeded{color}
991206 2018-05-23 15:41:27,440 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000008 transitioned from RUNNING to
EXITED_WITH_SUCCESS
991207 2018-05-23 15:41:27,440 INFO launcher.ContainerLaunch
(ContainerLaunch.java:cleanupContainer(541)) - Cleaning up container
container_e11_1526961340393_0012_01_000008
991208 2018-05-23 15:41:27,441 INFO runtime.DelegatingLinuxContainerRuntime
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using
container runtime: DefaultLinuxContainerRuntime
991209 2018-05-23 15:41:27,471 INFO container.ContainerImpl
(ContainerImpl.java:handle(1163)) - Container
container_e11_1526961340393_0012_01_000008 transitioned from
EXITED_WITH_SUCCESS to DONE
991210 2018-05-23 15:41:27,471 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(347)) - Removing
container_e11_1526961340393_0012_01_000008 from application
application_1526961340393_0012
991211 2018-05-23 15:41:27,471 INFO monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:isEnabled(222)) - Neither virutal-memory nor
physical-memory monitoring is needed. Not running the monitor-thread
991212 2018-05-23 15:41:27,471 INFO containermanager.AuxServices
(AuxServices.java:handle(215)) - Got event CONTAINER_STOP for appId
application_1526961340393_0012
991213 2018-05-23 15:41:27,471 INFO yarn.YarnShuffleService
(YarnShuffleService.java:stopContainer(198)) - Stopping container
container_e11_1526961340393_0012_01_000008
{color:#d04437}991214 2018-05-23 15:41:27,471 INFO yarn.YarnShuffleService
(YarnShuffleService.java:stopContainer(295)) - Stopping container
container_e11_1526961340393_0012_01_000008{color}
{color:#d04437} {color}
{color:#14892c}Now compare with 3.0 above, from Removed completed containers
(line 991215) to transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
(line 991218), it only took 1 seconds for 2.6.{color}
{color:#14892c} {color}
*991215 2018-05-23 15:41:27,472* INFO nodemanager.NodeStatusUpdaterImpl
(NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(553)) -
Removed completed containers from NM context:
[container_e11_1526961340393_0012_01_000005,
container_e11_1526961340393_0012_01_000003]
991216 2018-05-23 15:41:27,485 INFO runtime.DelegatingLinuxContainerRuntime
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using
container runtime: DefaultLinuxContainerRuntime
991217 2018-05-23 15:41:27,699 INFO runtime.DelegatingLinuxContainerRuntime
(DelegatingLinuxContainerRuntime.java:pickContainerRuntime(67)) - Using
container runtime: DefaultLinuxContainerRuntime
*991218 2018-05-23 15:41:28,473* INFO application.ApplicationImpl
(ApplicationImpl.java:handle(464)) - Application application_1526961340393_0012
transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
991219 2018-05-23 15:41:28,474 INFO containermanager.AuxServices
(AuxServices.java:handle(215)) - Got event APPLICATION_STOP for appId
application_1526961340393_0012
991220 2018-05-23 15:41:28,474 INFO yarn.YarnShuffleService
(YarnShuffleService.java:stopApplication(179)) - Stopping application
application_1526961340393_0012
991221 2018-05-23 15:41:28,475 INFO shuffle.ExternalShuffleBlockResolver
(ExternalShuffleBlockResolver.java:applicationRemoved(206)) - Application
application_1526961340393_0012 removed, cleanupLocalDirs = false
991222 2018-05-23 15:41:28,475 INFO shuffle.ExternalShuffleBlockResolver
(ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application
application_1526961340393_0012 removed, cleanupLocalDirs = false
991223 2018-05-23 15:41:28,475 INFO application.ApplicationImpl
(ApplicationImpl.java:handle(464)) - Application application_1526961340393_0012
transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
991224 2018-05-23 15:41:28,475 INFO loghandler.NonAggregatingLogHandler
(NonAggregatingLogHandler.java:handle(167)) - Scheduling Log Deletion for
application: application_1526961340393_0012, with delay of 604800 seconds
Above logs are from changing the yarn.nodemanager.container-monitor.interval-ms
to 300 only. I removed the monitoring to false property.
To summarize, I believe there are two places that need to be investigated:
1. on each container, between container succeeded to stopping container, it
takes 4 seconds for 3.0, and only 1 second for 2.6
2. from 'Removed completed containers' to 'application transitioned from
RUNNING to APPLICATION_RESOURCES_CLEANINGUP', 3.0 takes a wopping 5 seconds and
only 1 second for 2.6.
> Yarn 3.0 seems runs slower than Yarn 2.6
> ----------------------------------------
>
> Key: YARN-8326
> URL: https://issues.apache.org/jira/browse/YARN-8326
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 3.0.0
> Environment: This is the yarn-site.xml for 3.0.
>
> <configuration>
> <property>
> <name>hadoop.registry.dns.bind-port</name>
> <value>5353</value>
> </property>
> <property>
> <name>hadoop.registry.dns.domain-name</name>
> <value>hwx.site</value>
> </property>
> <property>
> <name>hadoop.registry.dns.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>hadoop.registry.dns.zone-mask</name>
> <value>255.255.255.0</value>
> </property>
> <property>
> <name>hadoop.registry.dns.zone-subnet</name>
> <value>172.17.0.0</value>
> </property>
> <property>
> <name>manage.include.files</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.acl.enable</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.admin.acl</name>
> <value>yarn</value>
> </property>
> <property>
> <name>yarn.client.nodemanager-connect.max-wait-ms</name>
> <value>60000</value>
> </property>
> <property>
> <name>yarn.client.nodemanager-connect.retry-interval-ms</name>
> <value>10000</value>
> </property>
> <property>
> <name>yarn.http.policy</name>
> <value>HTTP_ONLY</value>
> </property>
> <property>
> <name>yarn.log-aggregation-enable</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.log-aggregation.retain-seconds</name>
> <value>2592000</value>
> </property>
> <property>
> <name>yarn.log.server.url</name>
>
> <value>[http://xxxxxx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs]</value>
> </property>
> <property>
> <name>yarn.log.server.web-service.url</name>
>
> <value>[http://xxxxxx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory]</value>
> </property>
> <property>
> <name>yarn.node-labels.enabled</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.node-labels.fs-store.retry-policy-spec</name>
> <value>2000, 500</value>
> </property>
> <property>
> <name>yarn.node-labels.fs-store.root-dir</name>
> <value>/system/yarn/node-labels</value>
> </property>
> <property>
> <name>yarn.nodemanager.address</name>
> <value>0.0.0.0:45454</value>
> </property>
> <property>
> <name>yarn.nodemanager.admin-env</name>
> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
> </property>
> <property>
> <name>yarn.nodemanager.aux-services</name>
> <value>mapreduce_shuffle,spark2_shuffle,timeline_collector</value>
> </property>
> <property>
> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
> <value>org.apache.hadoop.mapred.ShuffleHandler</value>
> </property>
> <property>
> <name>yarn.nodemanager.aux-services.spark2_shuffle.class</name>
> <value>org.apache.spark.network.yarn.YarnShuffleService</value>
> </property>
> <property>
> <name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
> <value>/usr/spark2/aux/*</value>
> </property>
> <property>
> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
> <value>org.apache.spark.network.yarn.YarnShuffleService</value>
> </property>
> <property>
> <name>yarn.nodemanager.aux-services.timeline_collector.class</name>
>
> <value>org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService</value>
> </property>
> <property>
> <name>yarn.nodemanager.bind-host</name>
> <value>0.0.0.0</value>
> </property>
> <property>
> <name>yarn.nodemanager.container-executor.class</name>
>
> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
> </property>
> <property>
> <name>yarn.nodemanager.container-metrics.unregister-delay-ms</name>
> <value>60000</value>
> </property>
> <property>
> <name>yarn.nodemanager.container-monitor.interval-ms</name>
> <value>3000</value>
> </property>
> <property>
> <name>yarn.nodemanager.delete.debug-delay-sec</name>
> <value>0</value>
> </property>
> <property>
>
> <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
> <value>90</value>
> </property>
> <property>
> <name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name>
> <value>1000</value>
> </property>
> <property>
> <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
> <value>0.25</value>
> </property>
> <property>
> <name>yarn.nodemanager.health-checker.interval-ms</name>
> <value>135000</value>
> </property>
> <property>
> <name>yarn.nodemanager.health-checker.script.timeout-ms</name>
> <value>60000</value>
> </property>
> <property>
>
> <name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.nodemanager.linux-container-executor.group</name>
> <value>hadoop</value>
> </property>
> <property>
>
> <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.nodemanager.local-dirs</name>
> <value>/hadoop/yarn/local</value>
> </property>
> <property>
> <name>yarn.nodemanager.log-aggregation.compression-type</name>
> <value>gz</value>
> </property>
> <property>
> <name>yarn.nodemanager.log-aggregation.debug-enabled</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.nodemanager.log-aggregation.num-log-files-per-app</name>
> <value>30</value>
> </property>
> <property>
>
> <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
> <value>3600</value>
> </property>
> <property>
> <name>yarn.nodemanager.log-dirs</name>
> <value>/hadoop/yarn/log</value>
> </property>
> <property>
> <name>yarn.nodemanager.log.retain-seconds</name>
> <value>604800</value>
> </property>
> <property>
> <name>yarn.nodemanager.pmem-check-enabled</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.nodemanager.recovery.dir</name>
> <value>/var/log/hadoop-yarn/nodemanager/recovery-state</value>
> </property>
> <property>
> <name>yarn.nodemanager.recovery.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.nodemanager.recovery.supervised</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.nodemanager.remote-app-log-dir</name>
> <value>/app-logs</value>
> </property>
> <property>
> <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
> <value>logs</value>
> </property>
> <property>
> <name>yarn.nodemanager.resource-plugins</name>
> <value></value>
> </property>
> <property>
> <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
> <value>auto</value>
> </property>
> <property>
> <name>yarn.nodemanager.resource-plugins.gpu.docker-plugin</name>
> <value>nvidia-docker-v1</value>
> </property>
> <property>
> <name>yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidiadocker-
> v1.endpoint</name>
> <value>[http://localhost:3476/v1.0/docker/cli]</value>
> </property>
> <property>
>
> <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
> <value></value>
> </property>
> <property>
> <name>yarn.nodemanager.resource.cpu-vcores</name>
> <value>6</value>
> </property>
> <property>
> <name>yarn.nodemanager.resource.memory-mb</name>
> <value>12288</value>
> </property>
> <property>
> <name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name>
> <value>80</value>
> </property>
> <property>
> <name>yarn.nodemanager.runtime.linux.allowed-runtimes</name>
> <value>default,docker</value>
> </property>
> <property>
> <name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name>
> <value>host,none,bridge</value>
> </property>
> <property>
> <name>yarn.nodemanager.runtime.linux.docker.capabilities</name>
> <value>
> CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,
> SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE</value>
> </property>
> <property>
> <name>yarn.nodemanager.runtime.linux.docker.default-container-network</name>
> <value>host</value>
> </property>
> <property>
> <name>yarn.nodemanager.runtime.linux.docker.privileged-containers.acl</name>
> <value></value>
> </property>
> <property>
>
> <name>yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.nodemanager.vmem-check-enabled</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.nodemanager.vmem-pmem-ratio</name>
> <value>2.1</value>
> </property>
> <property>
> <name>yarn.nodemanager.webapp.cross-origin.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.resourcemanager.address</name>
> <value>xxxxxxx:8050</value>
> </property>
> <property>
> <name>yarn.resourcemanager.admin.address</name>
> <value>xxxxxx:8141</value>
> </property>
> <property>
> <name>yarn.resourcemanager.am.max-attempts</name>
> <value>2</value>
> </property>
> <property>
> <name>yarn.resourcemanager.bind-host</name>
> <value>0.0.0.0</value>
> </property>
> <property>
> <name>yarn.resourcemanager.connect.max-wait.ms</name>
> <value>900000</value>
> </property>
> <property>
> <name>yarn.resourcemanager.connect.retry-interval.ms</name>
> <value>30000</value>
> </property>
> <property>
> <name>yarn.resourcemanager.fs.state-store.retry-policy-spec</name>
> <value>2000, 500</value>
> </property>
> <property>
> <name>yarn.resourcemanager.fs.state-store.uri</name>
> <value> </value>
> </property>
> <property>
> <name>yarn.resourcemanager.ha.enabled</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.resourcemanager.hostname</name>
> <value>xxxxxxxx</value>
> </property>
> <property>
>
> <name>yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval</name>
> <value>15000</value>
> </property>
> <property>
>
> <name>yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor</name>
> <value>1</value>
> </property>
> <property>
>
> <name>yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round</name>
> <value>0.25</value>
> </property>
> <property>
> <name>yarn.resourcemanager.nodes.exclude-path</name>
> <value>/etc/hadoop/conf/yarn.exclude</value>
> </property>
> <property>
> <name>yarn.resourcemanager.recovery.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.resourcemanager.resource-tracker.address</name>
> <value>xxxxxxx:8025</value>
> </property>
> <property>
> <name>yarn.resourcemanager.scheduler.address</name>
> <value>xxxxxxxx:8030</value>
> </property>
> <property>
> <name>yarn.resourcemanager.scheduler.class</name>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
> </property>
> <property>
> <name>yarn.resourcemanager.scheduler.monitor.enable</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.resourcemanager.state-store.max-completed-applications</name>
> <value>${yarn.resourcemanager.max-completed-applications}</value>
> </property>
> <property>
> <name>yarn.resourcemanager.store.class</name>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
> </property>
> <property>
>
> <name>yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size</name>
> <value>10</value>
> </property>
> <property>
> <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.resourcemanager.webapp.address</name>
> <value>xxxxxx:8088</value>
> </property>
> <property>
> <name>yarn.resourcemanager.webapp.cross-origin.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.resourcemanager.webapp.https.address</name>
> <value>wxxxxxx:8090</value>
> </property>
> <property>
> <name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name>
> <value>10000</value>
> </property>
> <property>
> <name>yarn.resourcemanager.zk-acl</name>
> <value>world:anyone:rwcda</value>
> </property>
> <property>
> <name>yarn.resourcemanager.zk-address</name>
> <value>xxxxxx:2181,xxxxxx:2181,xxxxxx:2181</value>
> </property>
> <property>
> <name>yarn.resourcemanager.zk-num-retries</name>
> <value>1000</value>
> </property>
> <property>
> <name>yarn.resourcemanager.zk-retry-interval-ms</name>
> <value>1000</value>
> </property>
> <property>
> <name>yarn.resourcemanager.zk-state-store.parent-path</name>
> <value>/rmstore</value>
> </property>
> <property>
> <name>yarn.resourcemanager.zk-timeout-ms</name>
> <value>10000</value>
> </property>
> <property>
> <name>yarn.rm.system-metricspublisher.emit-container-events</name>
> <value>true</value>
> </property>
> <property>
>
> <name>yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled</name>
> <value>false</value>
> </property>
> <property>
> <name>yarn.scheduler.maximum-allocation-mb</name>
> <value>12288</value>
> </property>
> <property>
> <name>yarn.scheduler.maximum-allocation-vcores</name>
> <value>6</value>
> </property>
> <property>
> <name>yarn.scheduler.minimum-allocation-mb</name>
> <value>64</value>
> </property>
> <property>
> <name>yarn.scheduler.minimum-allocation-vcores</name>
> <value>1</value>
> </property>
> <property>
> <name>yarn.service.framework.path</name>
> <value>/yarn/service-dep.tar.gz</value>
> </property>
> <property>
> <name>yarn.system-metricspublisher.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.timeline-service.address</name>
> <value>xxxxxx:10200</value>
> </property>
> <property>
> <name>yarn.timeline-service.bind-host</name>
> <value>0.0.0.0</value>
> </property>
> <property>
> <name>yarn.timeline-service.client.max-retries</name>
> <value>30</value>
> </property>
> <property>
> <name>yarn.timeline-service.client.retry-interval-ms</name>
> <value>1000</value>
> </property>
> <property>
> <name>yarn.timeline-service.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.timeline-service.entity-group-fs-store.active-dir</name>
> <value>/ats/active/</value>
> </property>
> <property>
> <name>yarn.timeline-service.entity-group-fs-store.app-cache-size</name>
> <value>10</value>
> </property>
> <property>
>
> <name>yarn.timeline-service.entity-group-fs-store.cleaner-interval-seconds</name>
> <value>3600</value>
> </property>
> <property>
> <name>yarn.timeline-service.entity-group-fs-store.done-dir</name>
> <value>/ats/done/</value>
> </property>
> <property>
>
> <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes</name>
> <value></value>
> </property>
> <property>
>
> <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
> <value></value>
> </property>
> <property>
> <name>yarn.timeline-service.entity-group-fs-store.retain-seconds</name>
> <value>604800</value>
> </property>
> <property>
>
> <name>yarn.timeline-service.entity-group-fs-store.scan-interval-seconds</name>
> <value>60</value>
> </property>
> <property>
> <name>yarn.timeline-service.entity-group-fs-store.summary-store</name>
>
> <value>org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore</value>
> </property>
> <property>
> <name>yarn.timeline-service.generic-application-history.store-class</name>
>
> <value>org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore</value>
> </property>
> <property>
> <name>yarn.timeline-service.hbase-schema.prefix</name>
> <value>prod.</value>
> </property>
> <property>
> <name>yarn.timeline-service.hbase.configuration.file</name>
> <value>[file:///etc/yarn-hbase/conf/hbase-site.xml]</value>
> </property>
> <property>
> <name>yarn.timeline-service.hbase.coprocessor.jar.hdfs.location</name>
>
> <value>[file:///hadoop-yarn-client/timelineservice/hadoop-yarn-server-timelineservice-hbase-coprocessor.jar|file:///usr/hdp/current/hadoop-yarn-client/timelineservice/hadoop-yarn-server-timelineservice-hbase-coprocessor.jar]</value>
> </property>
> <property>
>
> <name>yarn.timeline-service.http-authentication.simple.anonymous.allowed</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.timeline-service.http-authentication.type</name>
> <value>simple</value>
> </property>
> <property>
> <name>yarn.timeline-service.http-cross-origin.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.timeline-service.leveldb-state-store.path</name>
> <value>/hadoop/yarn/timeline</value>
> </property>
> <property>
> <name>yarn.timeline-service.leveldb-timeline-store.path</name>
> <value>/hadoop/yarn/timeline</value>
> </property>
> <property>
> <name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name>
> <value>104857600</value>
> </property>
> <property>
>
> <name>yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size</name>
> <value>10000</value>
> </property>
> <property>
>
> <name>yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size</name>
> <value>10000</value>
> </property>
> <property>
> <name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name>
> <value>300000</value>
> </property>
> <property>
> <name>yarn.timeline-service.reader.webapp.address</name>
> <value>xxxxxx:8198</value>
> </property>
> <property>
> <name>yarn.timeline-service.reader.webapp.https.address</name>
> <value>xxxxxx:8199</value>
> </property>
> <property>
> <name>yarn.timeline-service.recovery.enabled</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.timeline-service.state-store-class</name>
>
> <value>org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore</value>
> </property>
> <property>
> <name>yarn.timeline-service.store-class</name>
>
> <value>org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore</value>
> </property>
> <property>
> <name>yarn.timeline-service.ttl-enable</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.timeline-service.ttl-ms</name>
> <value>2678400000</value>
> </property>
> <property>
> <name>yarn.timeline-service.version</name>
> <value>2.0</value>
> </property>
> <property>
> <name>yarn.timeline-service.versions</name>
> <value>1.5f,2.0f</value>
> </property>
> <property>
> <name>yarn.timeline-service.webapp.address</name>
> <value>xxxxxx:8188</value>
> </property>
> <property>
> <name>yarn.timeline-service.webapp.https.address</name>
> <value>xxxxxx:8190</value>
> </property>
> <property>
> <name>yarn.webapp.api-service.enable</name>
> <value>true</value>
> </property>
> <property>
> <name>yarn.webapp.ui2.enable</name>
> <value>true</value>
> </property>
> </configuration>
> Reporter: Hsin-Liang Huang
> Priority: Major
> Attachments: image-2018-05-18-15-20-33-839.png,
> image-2018-05-18-15-22-30-948.png
>
>
> Hi, I am running testcases on Yarn 2.6 and Yarn 3.0 and found out the
> performance seems like twice slower on Yarn 3.0, and the performance would
> get even slower if we acquire more containers. I looked at the node manager
> logs on 2.6 vs 3.0. Here is what I find below.
> On 2.6 , this is a life cycle of a specific container, from beginning to
> end, it takes about{color:#14892c} 8 seconds{color} (9:53:50 to 9:53:58).
> !image-2018-05-18-15-20-33-839.png!
> On 3.0: the life cycle of a specific container looks like this, it
> takes{color:#d04437} 20 seconds{color} to finish the same job. (9:51:44 to
> 9:52:04)
> !image-2018-05-18-15-22-30-948.png!
> It seems like on 3.0, it spends an extra 5 seconds on
> monitor.ContinaerMonitorImpl (marked in {color:#d04437}red{color}) which
> doesn't happen in 2.6, and also after the job is done, and the container is
> exiting, on 3.0, it took 5 seconds to do that (9:51:59 to 9:52:04) which on
> 2.6, it only took less than 1/.2 of the time. (9: 53:56 to 9:53:58).
> Since we are running the same unit testcases and usually acquire more than
> 4 containers, therefore, when it addess up all these extra seconds, it
> became a huge performance issue. On 2.6, the unittest runs 7 hours whilc on
> 3.0, the same unitests runs 11 hours. I was told this performance delay
> might be caused by Hadoop’s new monitoring system Timeline service v2. Could
> someone take a look of this? Thanks for any help on this!!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]