Hi Alex,
Can you please attach your code? and the sample input data.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Apr 30, 2013 at 2:29 AM, alx...@aim.com wrote:
Hello,
I try to write mapreduce program in hadoop -1.0.4. using mapred libs. I have
a map function which gets
Hi,
I want to change the cluster's capacity of reduce slots on a per job basis.
Originally I have 8 reduce slots for a tasktracker.
I did:
conf.set(mapred.tasktracker.reduce.tasks.maximum, 4);
...
Job job = new Job(conf, ...)
And in the web UI I can see that for this job, the max reduce tasks
The config you are setting is for job only
But if you want to reduce the slota on tasktrackers then you will need to
edit tasktracker conf and restart tasktracker
On Apr 30, 2013 3:30 PM, Han JU ju.han.fe...@gmail.com wrote:
Hi,
I want to change the cluster's capacity of reduce slots on a per
Thanks Nitin.
What I need is to set slot only for a specific job, not for the whole
cluster conf.
But what I did does NOT work ... Have I done something wrong?
2013/4/30 Nitin Pawar nitinpawar...@gmail.com
The config you are setting is for job only
But if you want to reduce the slota on
forgot to add there is similar method for reducer as well
job.setNumReduceTasks(0);
On Tue, Apr 30, 2013 at 3:56 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
maximum number of reduce tasks that may be run by an
Thanks.
In fact I don't want to set reducer or mapper numbers, they are fine.
I want to set the reduce slot capacity of my cluster when it executes my
specific job. Say I have 100 reduce tasks for this job, I want my cluster
to execute 4 of them in the same time, not 8 of them in the same time,
Hi Pralabh
* *
1.The Map input bytes couter belongs to the MapReduce FrameWork. The
hadoop defintive explains that:
The number of bytes of uncompressed input consumed by all the maps in the
job. Incremented every time a record is read from a RecordReader and
passed
to the map’s map()
so basically if I understand correctly
you want to limit the # parallel execution of reducers only for this job?
On Tue, Apr 30, 2013 at 4:02 PM, Han JU ju.han.fe...@gmail.com wrote:
Thanks.
In fact I don't want to set reducer or mapper numbers, they are fine.
I want to set the reduce
Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
is 8.
And for this job, I want it to be 4.
I set it through conf and build the job with this conf, then submit it. But
hadoop lauches 8 reduce per datanode...
2013/4/30 Nitin Pawar nitinpawar...@gmail.com
so
I have relaxed it even further so now it is 775
kevin@devUbuntu05:/var/log/hadoop-0.20-mapreduce$ hadoop fs -ls -d /
Found 1 items
drwxrwxr-x - hdfs supergroup 0 2013-04-29 15:43 /
But I still get this error:
2013-04-30 07:43:02,520 FATAL
Hi,
When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
that the output of K and V of AvroMapper isnt writable and neither the key
is comparable (these are AvroKey and AvroValue). As the general
serialization mechanism is writable , how is the K,V pairs in case of avro
,
That is what I perceive as the problem. The hdfs file system was created
with the user 'hdfs' owning the root ('/') but for some reason with a M/R
job the user 'mapred' needs to have write permission to the root. I don't
know how to satisfy both conditions. That is one reason that I relaxed the
To further complicate the issue the log file in
(/var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-devUbuntu05.log) is
owned by mapred:mapred and the name of the file seems to indicate some other
lineage (hadoop,hadoop). I am out of my league in understanding the
permission structure for
what is your mapred.system.dir set to in mapred-site.xml?
By default it will write to /tmp on hdfs.
So you can do the following
create /tmp on hdfs and chmod it to 777 as user hdfs and then restart
jobtracker and tasktrackers.
In case its set to /mapred/something then create /mapred and chown
Thank you.
mapred.system.dir is not set. I am guessing that it is whatever the default
is. What should I set it to?
/tmp is already 777
kevin@devUbuntu05:~$ hadoop fs -ls /tmp
Found 1 items
drwxr-xr-x - hdfs supergroup 0 2013-04-29 15:45 /tmp/mapred
Based on the logs your system dir is set to
hdfs://devubuntu05:9000/data/hadoop/tmp/hadoop-mapred/mapred/system
what is your fs.default.name and hadoop.tmp.dir in core-site.xml set to?
--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/
On Apr 30, 2013, at 7:39 AM, Kevin Burton
In core-site.xml I have:
property
namefs.default.name/name
valuehdfs://devubuntu05:9000/value
descriptionThe name of the default file system. A URI whose scheme and
authority determine the FileSystem implementation. /description
/property
In hdfs-site.xml I have
property
Unsubscribe
ah
this is what mapred.sytem.dir defaults to
property
namemapred.system.dir/name
value${hadoop.tmp.dir}/mapred/system/value
descriptionThe directory where MapReduce stores control files.
/description
/property
So thats why its trying to write to
I am not clear on what you are suggesting to create on HDFS or the local
file system. As I understand it hadoop.tmp.dir is the local file system. I
changed it so that the temporary files would be on a disk that has more
capacity then /tmp. So you are suggesting that I create /data/hadoop/tmp on
I am not sure how to create a jira.
Again I am not sure I understand your workaround. You are suggesting that I
create /data/hadoop/tmp on HDFS like:
sudo -u hdfs hadoop fs -mkdir /data/hadoop/tmp
I don't think I can chmod -R 777 on /data since it is a disk and as I
indicated it is
Kevin
You will have create a new account if you did not have one before.
--
Arpit
On Apr 30, 2013, at 9:11 AM, Kevin Burton rkevinbur...@charter.net wrote:
I don’t see a “create issue” button or tab. If I need to log in then I am
not sure what credentials I should use to log in because all I
I have a simple MapReduce job that I am trying to get to run on my cluster.
When I run it I get:
13/04/30 11:27:45 INFO mapreduce.Cluster: Failed to use
org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid
mapreduce.jobtracker.address configuration value for
To be clear when this code is run with 'java -jar' it runs without
exception. The exception occurs when I run with 'hadoop jar'.
From: Kevin Burton [mailto:rkevinbur...@charter.net]
Sent: Tuesday, April 30, 2013 11:36 AM
To: user@hadoop.apache.org
Subject: Can't initialize cluster
I have
Sorry Kevin, I was away for a while. Are you good now?
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Tue, Apr 30, 2013 at 9:50 PM, Arpit Gupta ar...@hortonworks.com wrote:
Kevin
You will have create a new account if you did not have one before.
--
Arpit
On Apr
--
Regards
N.H Sandeep
Hai Hadoop,
--
*
With Best Regards
Manoj Kumar Sahu
Ameerpet,
Hyderabad-500016.
8374232928 /7842496524
*
Pl. *Save a tree. Please don't print this e-mail unless you really need
to...*
Set HADOOP_MAPRED_HOME in your hadoop-env.sh file and re-run the job. See
if it helps.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Tue, Apr 30, 2013 at 10:10 PM, Kevin Burton rkevinbur...@charter.netwrote:
To be clear when this code is run with ‘java –jar’ it runs
We/I are/am making progress. Now I get the error:
13/04/30 12:59:40 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
13/04/30 12:59:40 INFO mapred.JobClient: Cleaning up the staging area
Hi guys:
Im wondering - if I'm running mapreduce jobs on a cluster with large block
sizes - can i increase performance with either:
1) A custom FileInputFormat
2) A custom partitioner
3) -DnumReducers
Clearly, (3) will be an issue due to the fact that it might overload tasks
and network
Well, to be more clear, I'm wondering how hadoop-mapreduce can be optimized
in a block-less filesystem... And am thinking about application tier ways
to simulate blocks - i.e. by making the granularity of partitions smaller.
Wondering, if there is a way to hack an increased numbers of partitions
Yes it is a problem at the first stage. What I'm wondering, though, is
wether the intermediate results - which happen after the mapper phase - can
be optimized.
On Tue, Apr 30, 2013 at 3:38 PM, Mohammad Tariq donta...@gmail.com wrote:
Hmmm. I was actually thinking about the very first step.
Increasing the size can help us to an extent, but increasing it further
might cause problems during copy and shuffle. If the partitions are too big
to be held in the memory, we'll end up with *disk based shuffle* which is
gonna be slower than *RAM based shuffle,* thus delaying the entire reduce
Hello,
I am new to Hadoop and trying to install multinode cluster on ubuntu VM's.I
am not able to communicate between two clusters using SSH.
My host file:
127.0.1.1 Master
127.0.1.2 Slave
The following changes i made in two VM's
1.Updated the etc/hosts file in two vm's
on Master VM
i did
ssh is actually *user@some_machine *to *user@some_other_machine*. either
use same username on both the machines or add the IPs along with proper
user@hostname in /etc/hosts file.
HTH
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 2:39 AM, Automation
Thank you Tariq.
I am using the same username on both the machines and when i try to copy a
file master to slave just to make sure SSH is working fine, The file is
copying into master itself not an slave machine.
scp -r /usr/local/somefile hduser@slave:/usr/local/somefile
Any suggestions...
show me your /etc/hosts file along with the output of users and
hostname on both the machines.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 3:00 AM, Automation Me anautom...@gmail.com wrote:
Thank you Tariq.
I am using the same username on both
Looks like you have just cloned/copied the same VMs. Change the hostname of
each:
http://askubuntu.com/questions/87665/how-do-i-change-the-hostname-without-a-restart
On Tue, Apr 30, 2013 at 2:30 PM, Automation Me anautom...@gmail.com wrote:
Thank you Tariq.
I am using the same username on
@Mitra Yes I cloned the same VM's. By default Ubuntu takes 127.0.0.1
-ubuntu hostname for all machines
@Tariq i will send the hosts file and users of all the machines.
On Tue, Apr 30, 2013 at 5:42 PM, Mitra Kaseebhotla
mitra.kaseebho...@gmail.com wrote:
Looks like you have just cloned/copied
Hi Tariq,
Master:
Users:
hduser hduser
hostname:
ubuntu
*etc/hosts*
127.0.0.1 localhost
127.0.1.1 ubuntu
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
comment out 127.0.1.1 ubuntu in both the machines.
if it still doesn't work change 127.0.1.1master to something else, like
127.0.0.3 or something.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 3:34 AM, Automation Me anautom...@gmail.com wrote:
and change the hostname to reflect your actual hostnames.
On Tue, Apr 30, 2013 at 3:14 PM, Mohammad Tariq donta...@gmail.com wrote:
comment out 127.0.1.1 ubuntu in both the machines.
if it still doesn't work change 127.0.1.1master to something else,
like 127.0.0.3 or something.
Warm
Thank you Tariq. I will try that...
On Tue, Apr 30, 2013 at 6:14 PM, Mohammad Tariq donta...@gmail.com wrote:
comment out 127.0.1.1 ubuntu in both the machines.
if it still doesn't work change 127.0.1.1master to something else,
like 127.0.0.3 or something.
Warm Regards,
Tariq
Thank you Mitra..I will change the hostname
On Tue, Apr 30, 2013 at 6:16 PM, Mitra Kaseebhotla
mitra.kaseebho...@gmail.com wrote:
and change the hostname to reflect your actual hostnames.
On Tue, Apr 30, 2013 at 3:14 PM, Mohammad Tariq donta...@gmail.comwrote:
comment out 127.0.1.1
What do you mean increasing the size? Im talking more about increasing the
number of partitions... Which actually decreases individual file size.
On Apr 30, 2013, at 4:09 PM, Mohammad Tariq donta...@gmail.com wrote:
Increasing the size can help us to an extent, but increasing it further might
Oops, moving for sure this time :)
On Wed, May 1, 2013 at 10:35 AM, Harsh J ha...@cloudera.com wrote:
Moving the question to Apache Avro's user@ lists. Please use the right
lists for the most relevant answers.
Avro is a different serialization technique that intends to replace
the Writable
48 matches
Mail list logo