[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488263#comment-16488263 ] Hsin-Liang Huang commented on YARN-8326: Here is more detail information from node manager log that compares between Hadoop 3.0 and 2.6. They are both running on 4 node cluster with 3 data nodes with same machine power/cpu/memory and same type of job. I picked only one node to compare the container cycle. *1. On 3.0.* when I request 8 containers to run on 3 data nodes, I picked the second node to examine the log: this job used 2 containers in this node: container *container_e04_1527109836290_0004_01_02* on application application_1527109836290_0004 (from container succeeded to Stopping container (from blue to red line) took about *4 seconds*) 152231 2018-05-23 15:04:45,541 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(1059)) - Start request for container_e04_1527109836290_0004_01_02 by user hlhuang 152232 2018-05-23 15:04:45,657 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(1127)) - Creating a new application reference for app application_1527109836290_0004 152233 2018-05-23 15:04:45,658 INFO application.ApplicationImpl (ApplicationImpl.java:handle(632)) - Application application_1527109836290_0004 transitioned from NEW to INITING 152234 2018-05-23 15:04:45,658 INFO application.ApplicationImpl (ApplicationImpl.java:transition(446)) - Adding container_e04_1527109836290_0004_01_02 to application application_1527109836290_0004 152235 2018-05-23 15:04:45,658 INFO application.ApplicationImpl (ApplicationImpl.java:handle(632)) - Application application_1527109836290_0004 transitioned from INITING to RUNNING 152236 2018-05-23 15:04:45,659 INFO container.ContainerImpl (ContainerImpl.java:handle(2108)) - Container container_e04_1527109836290_0004_01_02 transitioned from NEW to SCHEDULED 152237 2018-05-23 15:04:45,659 INFO containermanager.AuxServices (AuxServices.java:handle(220)) - Got event CONTAINER_INIT for appId application_1527109836290_0004 152238 2018-05-23 15:04:45,659 INFO yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(289)) - Initializing container container_e04_1527109836290_0004_01_02 152239 2018-05-23 15:04:45,660 INFO scheduler.ContainerScheduler (ContainerScheduler.java:startContainer(503)) - Starting container [container_e04_1527109836290_0004_01_02] 152246 2018-05-23 15:04:45,965 INFO container.ContainerImpl (ContainerImpl.java:handle(2108)) - Container container_e04_1527109836290_0004_01_02 transitioned from SCHEDULED to RUNNING 152247 2018-05-23 15:04:45,965 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:onStartMonitoringContainer(941)) - Starting resource-monitoring for container_e04_1527109836290_0004_01_02 {color:#205081}152250 2018-05-23 15:04:46,002 INFO launcher.ContainerLaunch (ContainerLaunch.java:handleContainerExitCode(512)) - Container container_e04_1527109836290_0004_01_02 succeeded{color} 152251 2018-05-23 15:04:46,003 INFO container.ContainerImpl (ContainerImpl.java:handle(2108)) - Container container_e04_1527109836290_0004_01_02 transitioned from RUNNING to EXITED_WITH_SUCCESS 152252 2018-05-23 15:04:46,003 INFO launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container container_e04_1527109836290_0004_01_02 152254 2018-05-23 15:04:48,132 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(794)) - Deleting absolute path : /hadoop/yarn/local/usercache/hlhuang/appcache/application_1527109836290_0004/container_e04_1527109836290_0004_01_02 152256 2018-05-23 15:04:48,133 INFO container.ContainerImpl (ContainerImpl.java:handle(2108)) - Container container_e04_1527109836290_0004_01_02 transitioned from EXITED_WITH_SUCCESS to DONE 152258 2018-05-23 15:04:49,171 INFO nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(682)) - Removed completed containers from NM context: [container_e04_1527109836290_0004_01_02] 152260 2018-05-23 15:04:50,289 INFO application.ApplicationImpl (ApplicationImpl.java:transition(489)) - Removing container_e04_1527109836290_0004_01_02 from application application_1527109836290_0004 {color:#d04437}152261 2018-05-23 15:04:50,290 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:onStopMonitoringContainer(932)) - Stopping resource-monitoring for container_e04_1527109836290_0004_01_02{color} 152263 2018-05-23 15:04:50,290 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopContainer(295)) - Stopping container container_e04_1527109836290_0004_01_02 152262 2018-05-23 15:04:50,290 INFO containermanager.AuxServices
[jira] [Comment Edited] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488013#comment-16488013 ] Hsin-Liang Huang edited comment on YARN-8326 at 5/23/18 9:19 PM: - Hi [~eyang] I ran the sample job, {color:#14892c}time hadoop jar /usr/hdp/3.0.0.0-829/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.0.0.3.0.0.0-829.jar Client -classpath simple-yarn-app-1.1.0.jar -cmd "java com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 8"{color} with the changed settings, it still ran 15 seconds compared to 6 or 7 seconds in 2.6 environment. So I am not sure if the significant performance role that these two monitoring setting would play in this. The major issue could still be in the exiting container that in 3.0 environment is much slower than 2.6 environment. Can someone from yarn team look into this? This is a general yarn application performance issue in 3.0. was (Author: hlhu...@us.ibm.com): Hi [~eyang] I ran the sample job, with the changed settings, it still ran 15 seconds compared to 6 or 7 seconds in 2.6 environment. So I am not sure if the significant performance role that these two monitoring setting would play in this. The major issue could still be in the exiting container that in 3.0 environment is much slower than 2.6 environment. Can someone from yarn team look into this? This is a general yarn application performance issue in 3.0. > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > >
[jira] [Issue Comment Deleted] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hsin-Liang Huang updated YARN-8326: --- Comment: was deleted (was: HI Eric, I tried the suggestion and changed the setting. The result on running {color:#14892c}time hadoop jar /usr/hdp/3.0.0.0-829/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.0.0.3.0.0.0-829.jar Client -classpath simple-yarn-app-1.1.0.jar -cmd "java com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 8"{color} is 20s, 15s and 15s (I ran it 3 times). It didn't get better if it's not worse. (It was 14, 15 seconds before). ) > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nodemanager.log-aggregation.debug-enabled > false > > > yarn.nodemanager.log-aggregation.num-log-files-per-app > 30 > > > > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds > 3600 > > > yarn.nodemanager.log-dirs > /hadoop/yarn/log > > > yarn.nodemanager.log.retain-seconds > 604800 > > > yarn.nodemanager.pmem-check-enabled > false > > > yarn.nodemanager.recovery.dir > /var/log/hadoop-yarn/nodemanager/recovery-state > > > yarn.nodemanager.recovery.enabled > true > > > yarn.nodemanager.recovery.supervised > true > > > yarn.nodemanager.remote-app-log-dir > /app-logs > > > yarn.nodemanager.remote-app-log-dir-suffix > logs > > > yarn.nodemanager.resource-plugins > > > > yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices > auto > > >
[jira] [Comment Edited] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488013#comment-16488013 ] Hsin-Liang Huang edited comment on YARN-8326 at 5/23/18 9:15 PM: - Hi [~eyang] I ran the sample job, with the changed settings, it still ran 15 seconds compared to 6 or 7 seconds in 2.6 environment. So I am not sure if the significant performance role that these two monitoring setting would play in this. The major issue could still be in the exiting container that in 3.0 environment is much slower than 2.6 environment. Can someone from yarn team look into this? This is a general yarn application performance issue in 3.0. was (Author: hlhu...@us.ibm.com): Hi [~eyang] Here is another update. Even though the simple job that I ran with the suggested setting changed, the performance was improved. However, I ran our unit testcases, and it still ran 14 hours compared to 7 hours in 2.6 environment. I also ran another sample job, with the changed settings, it still ran 15 seconds compared to 6 or 7 seconds in 2.6 environment. So I think even though monitoring setting might affect the performance issue, but it only plays a little part, the major issue could still be in the exiting container that in 3.0 environment is much slower than 2.6 environment. Is there anyone looking into this area? Thanks! > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nodemanager.log-aggregation.debug-enabled > false > > >
[jira] [Issue Comment Deleted] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hsin-Liang Huang updated YARN-8326: --- Comment: was deleted (was: [~eyang] this afternoon, I tried the command and the performance was dramatically improved. It used to run 8 seconds, now it ran 3 seconds consistently, then I compared with the other 3.0 cluster which I didn't make the properties changes that you suggested, and it still ran 8 seconds consistently. I am going to run our testcases to see if the performance is also improved there. ) > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nodemanager.log-aggregation.debug-enabled > false > > > yarn.nodemanager.log-aggregation.num-log-files-per-app > 30 > > > > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds > 3600 > > > yarn.nodemanager.log-dirs > /hadoop/yarn/log > > > yarn.nodemanager.log.retain-seconds > 604800 > > > yarn.nodemanager.pmem-check-enabled > false > > > yarn.nodemanager.recovery.dir > /var/log/hadoop-yarn/nodemanager/recovery-state > > > yarn.nodemanager.recovery.enabled > true > > > yarn.nodemanager.recovery.supervised > true > > > yarn.nodemanager.remote-app-log-dir > /app-logs > > > yarn.nodemanager.remote-app-log-dir-suffix > logs > > > yarn.nodemanager.resource-plugins > > > > yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices > auto > > > yarn.nodemanager.resource-plugins.gpu.docker-plugin > nvidia-docker-v1 > > >
[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488013#comment-16488013 ] Hsin-Liang Huang commented on YARN-8326: Hi [~eyang] Here is another update. Even though the simple job that I ran with the suggested setting changed, the performance was improved. However, I ran our unit testcases, and it still ran 14 hours compared to 7 hours in 2.6 environment. I also ran another sample job, with the changed settings, it still ran 15 seconds compared to 6 or 7 seconds in 2.6 environment. So I think even though monitoring setting might affect the performance issue, but it only plays a little part, the major issue could still be in the exiting container that in 3.0 environment is much slower than 2.6 environment. Is there anyone looking into this area? Thanks! > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nodemanager.log-aggregation.debug-enabled > false > > > yarn.nodemanager.log-aggregation.num-log-files-per-app > 30 > > > > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds > 3600 > > > yarn.nodemanager.log-dirs > /hadoop/yarn/log > > > yarn.nodemanager.log.retain-seconds > 604800 > > > yarn.nodemanager.pmem-check-enabled > false > > > yarn.nodemanager.recovery.dir > /var/log/hadoop-yarn/nodemanager/recovery-state > > > yarn.nodemanager.recovery.enabled > true > > > yarn.nodemanager.recovery.supervised > true > > > yarn.nodemanager.remote-app-log-dir > /app-logs > > >
[jira] [Comment Edited] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484721#comment-16484721 ] Hsin-Liang Huang edited comment on YARN-8326 at 5/23/18 12:18 AM: -- [~eyang] this afternoon, I tried the command and the performance was dramatically improved. It used to run 8 seconds, now it ran 3 seconds consistently, then I compared with the other 3.0 cluster which I didn't make the properties changes that you suggested, and it still ran 8 seconds consistently. I am going to run our testcases to see if the performance is also improved there. was (Author: hlhu...@us.ibm.com): [~eyang] this afternoon, I tried the command and the performance was dramatically improved. It used to run 8 seconds, now it ran 3 seconds consistently, then I compared with the other HDP 3.0 cluster which I didn't make the properties changes that you suggested, and it still ran 8 seconds consistently. I am going to run our testcases to see if the performance is also improved there. > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nodemanager.log-aggregation.debug-enabled > false > > > yarn.nodemanager.log-aggregation.num-log-files-per-app > 30 > > > > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds > 3600 > > > yarn.nodemanager.log-dirs > /hadoop/yarn/log > > > yarn.nodemanager.log.retain-seconds > 604800 > > > yarn.nodemanager.pmem-check-enabled > false > > > yarn.nodemanager.recovery.dir >
[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484721#comment-16484721 ] Hsin-Liang Huang commented on YARN-8326: [~eyang] this afternoon, I tried the command and the performance was dramatically improved. It used to run 8 seconds, now it ran 3 seconds consistently, then I compared with the other HDP 3.0 cluster which I didn't make the properties changes that you suggested, and it still ran 8 seconds consistently. I am going to run our testcases to see if the performance is also improved there. > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nodemanager.log-aggregation.debug-enabled > false > > > yarn.nodemanager.log-aggregation.num-log-files-per-app > 30 > > > > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds > 3600 > > > yarn.nodemanager.log-dirs > /hadoop/yarn/log > > > yarn.nodemanager.log.retain-seconds > 604800 > > > yarn.nodemanager.pmem-check-enabled > false > > > yarn.nodemanager.recovery.dir > /var/log/hadoop-yarn/nodemanager/recovery-state > > > yarn.nodemanager.recovery.enabled > true > > > yarn.nodemanager.recovery.supervised > true > > > yarn.nodemanager.remote-app-log-dir > /app-logs > > > yarn.nodemanager.remote-app-log-dir-suffix > logs > > > yarn.nodemanager.resource-plugins > > > > yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices > auto > > > yarn.nodemanager.resource-plugins.gpu.docker-plugin > nvidia-docker-v1 > > >
[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483027#comment-16483027 ] Hsin-Liang Huang commented on YARN-8326: HI Eric, I tried the suggestion and changed the setting. The result on running {color:#14892c}time hadoop jar /usr/hdp/3.0.0.0-829/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.0.0.3.0.0.0-829.jar Client -classpath simple-yarn-app-1.1.0.jar -cmd "java com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 8"{color} is 20s, 15s and 15s (I ran it 3 times). It didn't get better if it's not worse. (It was 14, 15 seconds before). > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nodemanager.log-aggregation.debug-enabled > false > > > yarn.nodemanager.log-aggregation.num-log-files-per-app > 30 > > > > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds > 3600 > > > yarn.nodemanager.log-dirs > /hadoop/yarn/log > > > yarn.nodemanager.log.retain-seconds > 604800 > > > yarn.nodemanager.pmem-check-enabled > false > > > yarn.nodemanager.recovery.dir > /var/log/hadoop-yarn/nodemanager/recovery-state > > > yarn.nodemanager.recovery.enabled > true > > > yarn.nodemanager.recovery.supervised > true > > > yarn.nodemanager.remote-app-log-dir > /app-logs > > > yarn.nodemanager.remote-app-log-dir-suffix > logs > > > yarn.nodemanager.resource-plugins > > > > yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices > auto > > >
[jira] [Created] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
Hsin-Liang Huang created YARN-8326: -- Summary: Yarn 3.0 seems runs slower than Yarn 2.6 Key: YARN-8326 URL: https://issues.apache.org/jira/browse/YARN-8326 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0 Environment: This is the yarn-site.xml for 3.0. hadoop.registry.dns.bind-port 5353 hadoop.registry.dns.domain-name hwx.site hadoop.registry.dns.enabled true hadoop.registry.dns.zone-mask 255.255.255.0 hadoop.registry.dns.zone-subnet 172.17.0.0 hadoop.registry.zk.quorum whiny1.fyre.ibm.com:2181,whiny2.fyre.ibm.com:2181,whiny3.fyre.ibm.com:2181 manage.include.files false yarn.acl.enable false yarn.admin.acl yarn yarn.application.classpath $HADOOP_CONF_DIR,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/* yarn.client.nodemanager-connect.max-wait-ms 6 yarn.client.nodemanager-connect.retry-interval-ms 1 yarn.http.policy HTTP_ONLY yarn.log-aggregation-enable false yarn.log-aggregation.retain-seconds 2592000 yarn.log.server.url http://whiny2.fyre.ibm.com:19888/jobhistory/logs yarn.log.server.web-service.url http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory yarn.node-labels.enabled false yarn.node-labels.fs-store.retry-policy-spec 2000, 500 yarn.node-labels.fs-store.root-dir /system/yarn/node-labels yarn.nodemanager.address 0.0.0.0:45454 yarn.nodemanager.admin-env MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX yarn.nodemanager.aux-services mapreduce_shuffle,spark2_shuffle,timeline_collector yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.nodemanager.aux-services.spark2_shuffle.class org.apache.spark.network.yarn.YarnShuffleService yarn.nodemanager.aux-services.spark2_shuffle.classpath /usr/hdp/${hdp.version}/spark2/aux/* yarn.nodemanager.aux-services.spark_shuffle.class org.apache.spark.network.yarn.YarnShuffleService yarn.nodemanager.aux-services.spark_shuffle.classpath /usr/hdp/${hdp.version}/spark/aux/* yarn.nodemanager.aux-services.timeline_collector.class org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService yarn.nodemanager.bind-host 0.0.0.0 yarn.nodemanager.container-executor.class org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor yarn.nodemanager.container-metrics.unregister-delay-ms 6 yarn.nodemanager.container-monitor.interval-ms 3000 yarn.nodemanager.delete.debug-delay-sec 0 yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage 90 yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb 1000 yarn.nodemanager.disk-health-checker.min-healthy-disks 0.25 yarn.nodemanager.health-checker.interval-ms 135000 yarn.nodemanager.health-checker.script.timeout-ms 6 yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage false yarn.nodemanager.linux-container-executor.group hadoop yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users false yarn.nodemanager.local-dirs /hadoop/yarn/local yarn.nodemanager.log-aggregation.compression-type gz yarn.nodemanager.log-aggregation.debug-enabled false yarn.nodemanager.log-aggregation.num-log-files-per-app 30 yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds 3600 yarn.nodemanager.log-dirs /hadoop/yarn/log yarn.nodemanager.log.retain-seconds 604800 yarn.nodemanager.pmem-check-enabled false yarn.nodemanager.recovery.dir /var/log/hadoop-yarn/nodemanager/recovery-state yarn.nodemanager.recovery.enabled true yarn.nodemanager.recovery.supervised true yarn.nodemanager.remote-app-log-dir /app-logs yarn.nodemanager.remote-app-log-dir-suffix logs yarn.nodemanager.resource-plugins yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices auto yarn.nodemanager.resource-plugins.gpu.docker-plugin nvidia-docker-v1 yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidiadocker- v1.endpoint http://localhost:3476/v1.0/docker/cli yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables yarn.nodemanager.resource.cpu-vcores 6 yarn.nodemanager.resource.memory-mb 12288 yarn.nodemanager.resource.percentage-physical-cpu-limit 80 yarn.nodemanager.runtime.linux.allowed-runtimes default,docker
[jira] [Commented] (YARN-8315) HDP 3.0.0 perfromance is slower than HDP 2.6.4
[ https://issues.apache.org/jira/browse/YARN-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479768#comment-16479768 ] Hsin-Liang Huang commented on YARN-8315: [https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/bk_ambari-installation/content/hdp_30_repositories.html] this is where I get the hdp.repo and ambari.repo to install hdp 3.0.0. It's still in Beta. > HDP 3.0.0 perfromance is slower than HDP 2.6.4 > -- > > Key: YARN-8315 > URL: https://issues.apache.org/jira/browse/YARN-8315 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: I have a HDP 2.6.4 cluster and HDP 3.0.0 cluster, I set > up to have the same settings for these two cluster such as java heap size, > container size etc. They are both 4 node cluster with 3 data nodes. I took > almost all the default setting on HDP 3.0.0 except that I modify the minimum > container size to 64MB instead of 1024MB in both cluster. > >Reporter: Hsin-Liang Huang >Priority: Major > > I can't find the button to delete this, so I just removed the text to avoid > sensitive information in the previous posting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8315) HDP 3.0.0 perfromance is slower than HDP 2.6.4
[ https://issues.apache.org/jira/browse/YARN-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hsin-Liang Huang updated YARN-8315: --- Description: I can't find the button to delete this, so I just removed the text to avoid sensitive information in the previous posting. (was: Hi, I am comparing the performance between HDP 3.0.0 and HDP 2.6.4 and I discovered HDP 3.0.0 is much slower than HDP 2.6.4 if the job acquire more yarn containers and we also pin point the problem is after the job is done, when it tried to clean up all the containers to exit the application, that's the place where it consumed more time than HDP 2.6.4. I used the simple yarn app that Hortonworks put out on github [https://github.com/hortonworks/simple-yarn-app] to do the testing. Below is my testing result from acquiring 8 containers in both HDP 3.0.0 and HDP 2.6.4 cluster environment. = HDP 3.0.0: command: time hadoop jar /usr/hdp/3.0.0.0-829/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.0.0.3.0.0.0-829.jar Client -classpath simple-yarn-app-1.1.0.jar -cmd "java com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 8" 18/05/17 11:06:42 INFO unmanagedamlauncher.UnmanagedAMLauncher: Initializing Client 18/05/17 11:06:42 INFO unmanagedamlauncher.UnmanagedAMLauncher: Starting Client 18/05/17 11:06:43 INFO client.RMProxy: Connecting to ResourceManager at whiny1.fyre.ibm.com/172.16.165.211:8050 18/05/17 11:06:43 INFO client.AHSProxy: Connecting to Application History server at whiny2.fyre.ibm.com/172.16.200.160:10200 18/05/17 11:06:43 INFO unmanagedamlauncher.UnmanagedAMLauncher: Setting up application submission context for ASM 18/05/17 11:06:43 INFO unmanagedamlauncher.UnmanagedAMLauncher: Setting unmanaged AM 18/05/17 11:06:43 INFO unmanagedamlauncher.UnmanagedAMLauncher: Submitting application to ASM 18/05/17 11:06:43 INFO impl.YarnClientImpl: Submitted application application_1526572577866_0011 18/05/17 11:06:44 INFO unmanagedamlauncher.UnmanagedAMLauncher: Got application report from ASM for, appId=11, appAttemptId=appattempt_1526572577866_0011_01, clientToAMToken=null, appDiagnostics=AM container is launched, waiting for AM container to Register with RM, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1526584003704, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=N/A, appUser=hlhuang 18/05/17 11:06:44 INFO unmanagedamlauncher.UnmanagedAMLauncher: Launching AM with application attempt id appattempt_1526572577866_0011_01 18/05/17 11:06:46 INFO client.RMProxy: Connecting to ResourceManager at whiny1.fyre.ibm.com/172.16.165.211:8030 registerApplicationMaster 0 registerApplicationMaster 1 18/05/17 11:06:47 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.0.0.0-829/0/resource-types.xml Making res-req 0 Making res-req 1 Making res-req 2 Making res-req 3 Making res-req 4 Making res-req 5 Making res-req 6 Making res-req 7 Launching container container_e08_1526572577866_0011_01_01 Launching container container_e08_1526572577866_0011_01_02 Launching container container_e08_1526572577866_0011_01_03 Launching container container_e08_1526572577866_0011_01_04 Launching container container_e08_1526572577866_0011_01_05 Launching container container_e08_1526572577866_0011_01_06 Launching container container_e08_1526572577866_0011_01_07 Launching container container_e08_1526572577866_0011_01_08 Completed container container_e08_1526572577866_0011_01_01 Completed container container_e08_1526572577866_0011_01_02 Completed container container_e08_1526572577866_0011_01_03 Completed container container_e08_1526572577866_0011_01_04 Completed container container_e08_1526572577866_0011_01_08 Completed container container_e08_1526572577866_0011_01_05 Completed container container_e08_1526572577866_0011_01_06 Completed container container_e08_1526572577866_0011_01_07 18/05/17 11:06:54 INFO unmanagedamlauncher.UnmanagedAMLauncher: AM process exited with value: 0 18/05/17 11:06:55 INFO unmanagedamlauncher.UnmanagedAMLauncher: Got application report from ASM for, appId=11, appAttemptId=appattempt_1526572577866_0011_01, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1526584003704, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED, appTrackingUrl=N/A, appUser=hlhuang 18/05/17 11:06:55 INFO unmanagedamlauncher.UnmanagedAMLauncher: App ended with state: FINISHED and status: SUCCEEDED 18/05/17 11:06:55 INFO unmanagedamlauncher.UnmanagedAMLauncher: Application has completed successfully. {color:#FF}real 0m14.716s{color} {color:#FF}user 0m11.642s{color} {color:#FF}sys 0m0.616s{color} HDP 2.6.4 command: time hadoop jar
[jira] [Created] (YARN-8315) HDP 3.0.0 perfromance is slower than HDP 2.6.4
Hsin-Liang Huang created YARN-8315: -- Summary: HDP 3.0.0 perfromance is slower than HDP 2.6.4 Key: YARN-8315 URL: https://issues.apache.org/jira/browse/YARN-8315 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0 Environment: I have a HDP 2.6.4 cluster and HDP 3.0.0 cluster, I set up to have the same settings for these two cluster such as java heap size, container size etc. They are both 4 node cluster with 3 data nodes. I took almost all the default setting on HDP 3.0.0 except that I modify the minimum container size to 64MB instead of 1024MB in both cluster. Reporter: Hsin-Liang Huang Hi, I am comparing the performance between HDP 3.0.0 and HDP 2.6.4 and I discovered HDP 3.0.0 is much slower than HDP 2.6.4 if the job acquire more yarn containers and we also pin point the problem is after the job is done, when it tried to clean up all the containers to exit the application, that's the place where it consumed more time than HDP 2.6.4. I used the simple yarn app that Hortonworks put out on github [https://github.com/hortonworks/simple-yarn-app] to do the testing. Below is my testing result from acquiring 8 containers in both HDP 3.0.0 and HDP 2.6.4 cluster environment. = HDP 3.0.0: command: time hadoop jar /usr/hdp/3.0.0.0-829/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.0.0.3.0.0.0-829.jar Client -classpath simple-yarn-app-1.1.0.jar -cmd "java com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 8" 18/05/17 11:06:42 INFO unmanagedamlauncher.UnmanagedAMLauncher: Initializing Client 18/05/17 11:06:42 INFO unmanagedamlauncher.UnmanagedAMLauncher: Starting Client 18/05/17 11:06:43 INFO client.RMProxy: Connecting to ResourceManager at whiny1.fyre.ibm.com/172.16.165.211:8050 18/05/17 11:06:43 INFO client.AHSProxy: Connecting to Application History server at whiny2.fyre.ibm.com/172.16.200.160:10200 18/05/17 11:06:43 INFO unmanagedamlauncher.UnmanagedAMLauncher: Setting up application submission context for ASM 18/05/17 11:06:43 INFO unmanagedamlauncher.UnmanagedAMLauncher: Setting unmanaged AM 18/05/17 11:06:43 INFO unmanagedamlauncher.UnmanagedAMLauncher: Submitting application to ASM 18/05/17 11:06:43 INFO impl.YarnClientImpl: Submitted application application_1526572577866_0011 18/05/17 11:06:44 INFO unmanagedamlauncher.UnmanagedAMLauncher: Got application report from ASM for, appId=11, appAttemptId=appattempt_1526572577866_0011_01, clientToAMToken=null, appDiagnostics=AM container is launched, waiting for AM container to Register with RM, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1526584003704, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=N/A, appUser=hlhuang 18/05/17 11:06:44 INFO unmanagedamlauncher.UnmanagedAMLauncher: Launching AM with application attempt id appattempt_1526572577866_0011_01 18/05/17 11:06:46 INFO client.RMProxy: Connecting to ResourceManager at whiny1.fyre.ibm.com/172.16.165.211:8030 registerApplicationMaster 0 registerApplicationMaster 1 18/05/17 11:06:47 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.0.0.0-829/0/resource-types.xml Making res-req 0 Making res-req 1 Making res-req 2 Making res-req 3 Making res-req 4 Making res-req 5 Making res-req 6 Making res-req 7 Launching container container_e08_1526572577866_0011_01_01 Launching container container_e08_1526572577866_0011_01_02 Launching container container_e08_1526572577866_0011_01_03 Launching container container_e08_1526572577866_0011_01_04 Launching container container_e08_1526572577866_0011_01_05 Launching container container_e08_1526572577866_0011_01_06 Launching container container_e08_1526572577866_0011_01_07 Launching container container_e08_1526572577866_0011_01_08 Completed container container_e08_1526572577866_0011_01_01 Completed container container_e08_1526572577866_0011_01_02 Completed container container_e08_1526572577866_0011_01_03 Completed container container_e08_1526572577866_0011_01_04 Completed container container_e08_1526572577866_0011_01_08 Completed container container_e08_1526572577866_0011_01_05 Completed container container_e08_1526572577866_0011_01_06 Completed container container_e08_1526572577866_0011_01_07 18/05/17 11:06:54 INFO unmanagedamlauncher.UnmanagedAMLauncher: AM process exited with value: 0 18/05/17 11:06:55 INFO unmanagedamlauncher.UnmanagedAMLauncher: Got application report from ASM for, appId=11, appAttemptId=appattempt_1526572577866_0011_01, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1526584003704, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED,