Reduce task copy: very low speed
Hi, I've built a Hadoop cluster from two computers( master and slave), using Hadoop 0.18.2/HBase 0.18.1. While running Map-Reduce jobs on 5-10 GB files I've noticed that reduce-copy tasks from master to slave is taking too much time( ~30 minutes each ) with speed about 0.10 MB/s, despite the fact that master is connected to slave via 1GB switch, and I did /etc/hosts mapping using LAN addresses(10.x.x.x). My questions: - Is there is a way to force hadoop to use ftp for example for copy of files? - Is there is some hadoop-site.xml configuration to improve copy files performance? I've tried to copy files with ftp ( master - slave computers ) and it works with average speed 50Mb/s. From reduce task lists web page ( only slave tasks): reduce copy (67 of 69 at 0.89 MB/s): task on master reduce copy (29 of 69 at 0.10 MB/s): task on slave Thanks in advance for any help or direction to search, Genady
Re: Reduce task copy: very low speed
I meet the same problem,the copy is too slow :( Genady Hi, I've built a Hadoop cluster from two computers( master and slave), using Hadoop 0.18.2/HBase 0.18.1. While running Map-Reduce jobs on 5-10 GB files I've noticed that reduce-copy tasks from master to slave is taking too much time( ~30 minutes each ) with speed about 0.10 MB/s, despite the fact that master is connected to slave via 1GB switch, and I did /etc/hosts mapping using LAN addresses(10.x.x.x). My questions: - Is there is a way to force hadoop to use ftp for example for copy of files? - Is there is some hadoop-site.xml configuration to improve copy files performance? I've tried to copy files with ftp ( master - slave computers ) and it works with average speed 50Mb/s. From reduce task lists web page ( only slave tasks): reduce copy (67 of 69 at 0.89 MB/s): task on master reduce copy (29 of 69 at 0.10 MB/s): task on slave Thanks in advance for any help or direction to search, Genady
Threads per mapreduce job
Hi everyone: How do I control the number of threads per mapreduce job. I am using bin/hadoop jar wordcount to run jobs and even though I have found these settings in hadoop-default.xml and changed the values to 1: namemapred.tasktracker.map.tasks.maximum/name namemapred.tasktracker.reduce.tasks.maximum/name The output of the job seems to indicate otherwise. 08/12/26 18:21:12 INFO mapred.JobClient: Job Counters 08/12/26 18:21:12 INFO mapred.JobClient: Launched reduce tasks=1 08/12/26 18:21:12 INFO mapred.JobClient: Rack-local map tasks=12 08/12/26 18:21:12 INFO mapred.JobClient: Launched map tasks=17 08/12/26 18:21:12 INFO mapred.JobClient: Data-local map tasks=4 I have 2 servers running the mapreduce process and the datanode process. Thanks, Michael
Re: Reduce task copy: very low speed
Hey, Are all the mappings done? If it's waiting for the last mapping to finish, it can't copy the output of that last mapping, meaning the average copy speed goes way down. In other words, you are comparing the theoretical instantaneous copy speed (1Gbps) versus the printout of the average speed, which includes the amount of time for all the mappings to finish. Transferring via FTP would be pointless (after all, FTP is a single datastream of TCP and HTTP is ... a single datastream of TCP) and there is nothing in hadoop-site.xml to tweak because the copy processes are waiting for the source data to be created. The best solution would be to add many more nodes to the cluster :) Brian On Dec 27, 2008, at 8:10 AM, d0ng wrote: I meet the same problem,the copy is too slow :( Genady Hi, I've built a Hadoop cluster from two computers( master and slave), using Hadoop 0.18.2/HBase 0.18.1. While running Map-Reduce jobs on 5-10 GB files I've noticed that reduce-copy tasks from master to slave is taking too much time( ~30 minutes each ) with speed about 0.10 MB/s, despite the fact that master is connected to slave via 1GB switch, and I did /etc/hosts mapping using LAN addresses(10.x.x.x). My questions: - Is there is a way to force hadoop to use ftp for example for copy of files? - Is there is some hadoop-site.xml configuration to improve copy files performance? I've tried to copy files with ftp ( master - slave computers ) and it works with average speed 50Mb/s. From reduce task lists web page ( only slave tasks): reduce copy (67 of 69 at 0.89 MB/s): task on master reduce copy (29 of 69 at 0.10 MB/s): task on slave Thanks in advance for any help or direction to search, Genady
Re: Threads per mapreduce job
mapred.map.multithreadedrunner.threads is the property u r looking for Michael wrote: Hi everyone: How do I control the number of threads per mapreduce job. I am using bin/hadoop jar wordcount to run jobs and even though I have found these settings in hadoop-default.xml and changed the values to 1: namemapred.tasktracker.map.tasks.maximum/name namemapred.tasktracker.reduce.tasks.maximum/name The output of the job seems to indicate otherwise. 08/12/26 18:21:12 INFO mapred.JobClient: Job Counters 08/12/26 18:21:12 INFO mapred.JobClient: Launched reduce tasks=1 08/12/26 18:21:12 INFO mapred.JobClient: Rack-local map tasks=12 08/12/26 18:21:12 INFO mapred.JobClient: Launched map tasks=17 08/12/26 18:21:12 INFO mapred.JobClient: Data-local map tasks=4 I have 2 servers running the mapreduce process and the datanode process. Thanks, Michael
Threads per mapreduce job
Hi everyone: How do I control the number of threads per mapreduce job. I am using bin/hadoop jar wordcount to run jobs and even though I have found these settings in hadoop-default.xml and changed the values to 1: namemapred.tasktracker.map.tasks.maximum/name namemapred.tasktracker.reduce.tasks.maximum/name The output of the job seems to indicate otherwise. 08/12/26 18:21:12 INFO mapred.JobClient: Job Counters 08/12/26 18:21:12 INFO mapred.JobClient: Launched reduce tasks=1 08/12/26 18:21:12 INFO mapred.JobClient: Rack-local map tasks=12 08/12/26 18:21:12 INFO mapred.JobClient: Launched map tasks=17 08/12/26 18:21:12 INFO mapred.JobClient: Data-local map tasks=4 I have 2 servers running the mapreduce process and the datanode process. Thanks, Michael
Re: Threads per mapreduce job
Thanks Sagar, However, when I add this to my hadoop-site.xml it doesn't listen: property namemapred.map.multithreadedrunner.threads/name value1/value /property I added it to both servers and here is the output of a test mapreduce run: 08/12/27 16:09:05 INFO mapred.JobClient: Job Counters 08/12/27 16:09:05 INFO mapred.JobClient: Launched reduce tasks=1 08/12/27 16:09:05 INFO mapred.JobClient: Rack-local map tasks=16 08/12/27 16:09:05 INFO mapred.JobClient: Launched map tasks=16 Thanks, Michael Sagar Naik wrote: mapred.map.multithreadedrunner.threads is the property u r looking for Michael wrote: Hi everyone: How do I control the number of threads per mapreduce job. I am using bin/hadoop jar wordcount to run jobs and even though I have found these settings in hadoop-default.xml and changed the values to 1: namemapred.tasktracker.map.tasks.maximum/name namemapred.tasktracker.reduce.tasks.maximum/name The output of the job seems to indicate otherwise. 08/12/26 18:21:12 INFO mapred.JobClient: Job Counters 08/12/26 18:21:12 INFO mapred.JobClient: Launched reduce tasks=1 08/12/26 18:21:12 INFO mapred.JobClient: Rack-local map tasks=12 08/12/26 18:21:12 INFO mapred.JobClient: Launched map tasks=17 08/12/26 18:21:12 INFO mapred.JobClient: Data-local map tasks=4 I have 2 servers running the mapreduce process and the datanode process. Thanks, Michael
Re: Threads per mapreduce job
Oh ok. So the file had 17 chunks. Thank you very much. --Michael On Dec 27, 2008 10:28 PM, Brian Bockelman bbock...@cse.unl.edu wrote: Hey Michael, I think you're misreading things. There are indeed 17 launched map tasks, which are run one at a time, not in parallel. Brian On Dec 27, 2008, at 9:24 AM, Michael wrote: Hi everyone: How do I control the number of threa...
Re: issues with hadoop in AIX
On 12/27/08 12:18 AM, Arun Venugopal arunvenugopa...@gmail.com wrote: Yes, I was able to run this on AIX as well with a minor change to the DF.java code. But this was more of a proof of concept than on a production system. There are lots of places where Hadoop (esp. in contrib) interprets the output of Unix command line utilities. Changes like this are likely going to be required for AIX and other Unix systems that aren't being used by a committer. :(