Hey Sam, Did you get a chance to retry with Sandy's suggestions? The config appears to be asking NMs to use roughly 22 total containers (as opposed to 12 total tasks in MR1 config) due to a 22 GB memory resource. This could impact much, given the CPU is still the same for both test runs.
On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <[email protected]> wrote: > Hey Sam, > > Thanks for sharing your results. I'm definitely curious about what's > causing the difference. > > A couple observations: > It looks like you've got yarn.nodemanager.resource.memory-mb in there twice > with two different values. > > Your max JVM memory of 1000 MB is (dangerously?) close to the default > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks getting > killed for running over resource limits? > > -Sandy > > > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <[email protected]> wrote: >> >> The terasort execution log shows that reduce spent about 5.5 mins from 33% >> to 35% as below. >> 13/06/10 08:02:22 INFO mapreduce.Job: map 100% reduce 31% >> 13/06/10 08:02:25 INFO mapreduce.Job: map 100% reduce 32% >> 13/06/10 08:02:46 INFO mapreduce.Job: map 100% reduce 33% >> 13/06/10 08:08:16 INFO mapreduce.Job: map 100% reduce 35% >> 13/06/10 08:08:19 INFO mapreduce.Job: map 100% reduce 40% >> 13/06/10 08:08:22 INFO mapreduce.Job: map 100% reduce 43% >> >> Any way, below are my configurations for your reference. Thanks! >> (A) core-site.xml >> only define 'fs.default.name' and 'hadoop.tmp.dir' >> >> (B) hdfs-site.xml >> <property> >> <name>dfs.replication</name> >> <value>1</value> >> </property> >> >> <property> >> <name>dfs.name.dir</name> >> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value> >> </property> >> >> <property> >> <name>dfs.data.dir</name> >> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value> >> </property> >> >> <property> >> <name>dfs.block.size</name> >> <value>134217728</value><!-- 128MB --> >> </property> >> >> <property> >> <name>dfs.namenode.handler.count</name> >> <value>64</value> >> </property> >> >> <property> >> <name>dfs.datanode.handler.count</name> >> <value>10</value> >> </property> >> >> (C) mapred-site.xml >> <property> >> <name>mapreduce.cluster.temp.dir</name> >> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value> >> <description>No description</description> >> <final>true</final> >> </property> >> >> <property> >> <name>mapreduce.cluster.local.dir</name> >> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value> >> <description>No description</description> >> <final>true</final> >> </property> >> >> <property> >> <name>mapreduce.child.java.opts</name> >> <value>-Xmx1000m</value> >> </property> >> >> <property> >> <name>mapreduce.framework.name</name> >> <value>yarn</value> >> </property> >> >> <property> >> <name>mapreduce.tasktracker.map.tasks.maximum</name> >> <value>8</value> >> </property> >> >> <property> >> <name>mapreduce.tasktracker.reduce.tasks.maximum</name> >> <value>4</value> >> </property> >> >> >> <property> >> <name>mapreduce.tasktracker.outofband.heartbeat</name> >> <value>true</value> >> </property> >> >> (D) yarn-site.xml >> <property> >> <name>yarn.resourcemanager.resource-tracker.address</name> >> <value>node1:18025</value> >> <description>host is the hostname of the resource manager and >> port is the port on which the NodeManagers contact the Resource >> Manager. >> </description> >> </property> >> >> <property> >> <description>The address of the RM web application.</description> >> <name>yarn.resourcemanager.webapp.address</name> >> <value>node1:18088</value> >> </property> >> >> >> <property> >> <name>yarn.resourcemanager.scheduler.address</name> >> <value>node1:18030</value> >> <description>host is the hostname of the resourcemanager and port is >> the port >> on which the Applications in the cluster talk to the Resource Manager. >> </description> >> </property> >> >> >> <property> >> <name>yarn.resourcemanager.address</name> >> <value>node1:18040</value> >> <description>the host is the hostname of the ResourceManager and the >> port is the port on >> which the clients can talk to the Resource Manager. </description> >> </property> >> >> <property> >> <name>yarn.nodemanager.local-dirs</name> >> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value> >> <description>the local directories used by the >> nodemanager</description> >> </property> >> >> <property> >> <name>yarn.nodemanager.address</name> >> <value>0.0.0.0:18050</value> >> <description>the nodemanagers bind to this port</description> >> </property> >> >> <property> >> <name>yarn.nodemanager.resource.memory-mb</name> >> <value>10240</value> >> <description>the amount of memory on the NodeManager in >> GB</description> >> </property> >> >> <property> >> <name>yarn.nodemanager.remote-app-log-dir</name> >> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value> >> <description>directory on hdfs where the application logs are moved to >> </description> >> </property> >> >> <property> >> <name>yarn.nodemanager.log-dirs</name> >> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value> >> <description>the directories used by Nodemanagers as log >> directories</description> >> </property> >> >> <property> >> <name>yarn.nodemanager.aux-services</name> >> <value>mapreduce.shuffle</value> >> <description>shuffle service that needs to be set for Map Reduce to >> run </description> >> </property> >> >> <property> >> <name>yarn.resourcemanager.client.thread-count</name> >> <value>64</value> >> </property> >> >> <property> >> <name>yarn.nodemanager.resource.cpu-cores</name> >> <value>24</value> >> </property> >> >> <property> >> <name>yarn.nodemanager.vcores-pcores-ratio</name> >> <value>3</value> >> </property> >> >> <property> >> <name>yarn.nodemanager.resource.memory-mb</name> >> <value>22000</value> >> </property> >> >> <property> >> <name>yarn.nodemanager.vmem-pmem-ratio</name> >> <value>2.1</value> >> </property> >> >> >> >> 2013/6/7 Harsh J <[email protected]> >>> >>> Not tuning configurations at all is wrong. YARN uses memory resource >>> based scheduling and hence MR2 would be requesting 1 GB minimum by >>> default, causing, on base configs, to max out at 8 (due to 8 GB NM >>> memory resource config) total containers. Do share your configs as at >>> this point none of us can tell what it is. >>> >>> Obviously, it isn't our goal to make MR2 slower for users and to not >>> care about such things :) >>> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <[email protected]> wrote: >>> > At the begining, I just want to do a fast comparision of MRv1 and Yarn. >>> > But >>> > they have many differences, and to be fair for comparison I did not >>> > tune >>> > their configurations at all. So I got above test results. After >>> > analyzing >>> > the test result, no doubt, I will configure them and do comparison >>> > again. >>> > >>> > Do you have any idea on current test result? I think, to compare with >>> > MRv1, >>> > Yarn is better on Map phase(teragen test), but worse on Reduce >>> > phase(terasort test). >>> > And any detailed suggestions/comments/materials on Yarn performance >>> > tunning? >>> > >>> > Thanks! >>> > >>> > >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <[email protected]> >>> >> >>> >> Why not to tune the configurations? >>> >> Both frameworks have many areas to tune: >>> >> - Combiners, Shuffle optimization, Block size, etc >>> >> >>> >> >>> >> >>> >> 2013/6/6 sam liu <[email protected]> >>> >>> >>> >>> Hi Experts, >>> >>> >>> >>> We are thinking about whether to use Yarn or not in the near future, >>> >>> and >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison. >>> >>> >>> >>> My env is three nodes cluster, and each node has similar hardware: 2 >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the same >>> >>> env. To >>> >>> be fair, I did not make any performance tuning on their >>> >>> configurations, but >>> >>> use the default configuration values. >>> >>> >>> >>> Before testing, I think Yarn will be much better than MRv1, if they >>> >>> all >>> >>> use default configuration, because Yarn is a better framework than >>> >>> MRv1. >>> >>> However, the test result shows some differences: >>> >>> >>> >>> MRv1: Hadoop-1.1.1 >>> >>> Yarn: Hadoop-2.0.4 >>> >>> >>> >>> (A) Teragen: generate 10 GB data: >>> >>> - MRv1: 193 sec >>> >>> - Yarn: 69 sec >>> >>> Yarn is 2.8 times better than MRv1 >>> >>> >>> >>> (B) Terasort: sort 10 GB data: >>> >>> - MRv1: 451 sec >>> >>> - Yarn: 1136 sec >>> >>> Yarn is 2.5 times worse than MRv1 >>> >>> >>> >>> After a fast analysis, I think the direct cause might be that Yarn is >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce phase. >>> >>> >>> >>> Here I have two questions: >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort? >>> >>> - What's the stratage for tuning Yarn performance? Is any materials? >>> >>> >>> >>> Thanks! >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Marcos Ortiz Valmaseda >>> >> Product Manager at PDVSA >>> >> http://about.me/marcosortiz >>> >> >>> > >>> >>> >>> >>> -- >>> Harsh J >> >> > -- Harsh J
