Re: reducer gets values with empty attributes

2013-04-30 Thread Mahesh Balija
Hi Alex, Can you please attach your code? and the sample input data. Best, Mahesh Balija, Calsoft Labs. On Tue, Apr 30, 2013 at 2:29 AM, alx...@aim.com wrote: Hello, I try to write mapreduce program in hadoop -1.0.4. using mapred libs. I have a map function which gets

Set reducer capacity for a specific M/R job

2013-04-30 Thread Han JU
Hi, I want to change the cluster's capacity of reduce slots on a per job basis. Originally I have 8 reduce slots for a tasktracker. I did: conf.set(mapred.tasktracker.reduce.tasks.maximum, 4); ... Job job = new Job(conf, ...) And in the web UI I can see that for this job, the max reduce tasks

Re: Set reducer capacity for a specific M/R job

2013-04-30 Thread Nitin Pawar
The config you are setting is for job only But if you want to reduce the slota on tasktrackers then you will need to edit tasktracker conf and restart tasktracker On Apr 30, 2013 3:30 PM, Han JU ju.han.fe...@gmail.com wrote: Hi, I want to change the cluster's capacity of reduce slots on a per

Re: Set reducer capacity for a specific M/R job

2013-04-30 Thread Han JU
Thanks Nitin. What I need is to set slot only for a specific job, not for the whole cluster conf. But what I did does NOT work ... Have I done something wrong? 2013/4/30 Nitin Pawar nitinpawar...@gmail.com The config you are setting is for job only But if you want to reduce the slota on

Re: Set reducer capacity for a specific M/R job

2013-04-30 Thread Nitin Pawar
forgot to add there is similar method for reducer as well job.setNumReduceTasks(0); On Tue, Apr 30, 2013 at 3:56 PM, Nitin Pawar nitinpawar...@gmail.comwrote: The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the maximum number of reduce tasks that may be run by an

Re: Set reducer capacity for a specific M/R job

2013-04-30 Thread Han JU
Thanks. In fact I don't want to set reducer or mapper numbers, they are fine. I want to set the reduce slot capacity of my cluster when it executes my specific job. Say I have 100 reduce tasks for this job, I want my cluster to execute 4 of them in the same time, not 8 of them in the same time,

Re: Relations ship between HDFS_BYTE_READ and Map input bytes

2013-04-30 Thread YouPeng Yang
Hi Pralabh * * 1.The Map input bytes couter belongs to the MapReduce FrameWork. The hadoop defintive explains that: The number of bytes of uncompressed input consumed by all the maps in the job. Incremented every time a record is read from a RecordReader and passed to the map’s map()

Re: Set reducer capacity for a specific M/R job

2013-04-30 Thread Nitin Pawar
so basically if I understand correctly you want to limit the # parallel execution of reducers only for this job? On Tue, Apr 30, 2013 at 4:02 PM, Han JU ju.han.fe...@gmail.com wrote: Thanks. In fact I don't want to set reducer or mapper numbers, they are fine. I want to set the reduce

Re: Set reducer capacity for a specific M/R job

2013-04-30 Thread Han JU
Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum is 8. And for this job, I want it to be 4. I set it through conf and build the job with this conf, then submit it. But hadoop lauches 8 reduce per datanode... 2013/4/30 Nitin Pawar nitinpawar...@gmail.com so

RE: Permission problem

2013-04-30 Thread Kevin Burton
I have relaxed it even further so now it is 775 kevin@devUbuntu05:/var/log/hadoop-0.20-mapreduce$ hadoop fs -ls -d / Found 1 items drwxrwxr-x - hdfs supergroup 0 2013-04-29 15:43 / But I still get this error: 2013-04-30 07:43:02,520 FATAL

Hadoop Avro Question

2013-04-30 Thread Rahul Bhattacharjee
Hi, When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed that the output of K and V of AvroMapper isnt writable and neither the key is comparable (these are AvroKey and AvroValue). As the general serialization mechanism is writable , how is the K,V pairs in case of avro ,

RE: Permission problem

2013-04-30 Thread Kevin Burton
That is what I perceive as the problem. The hdfs file system was created with the user 'hdfs' owning the root ('/') but for some reason with a M/R job the user 'mapred' needs to have write permission to the root. I don't know how to satisfy both conditions. That is one reason that I relaxed the

RE: Permission problem

2013-04-30 Thread Kevin Burton
To further complicate the issue the log file in (/var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-devUbuntu05.log) is owned by mapred:mapred and the name of the file seems to indicate some other lineage (hadoop,hadoop). I am out of my league in understanding the permission structure for

Re: Permission problem

2013-04-30 Thread Arpit Gupta
what is your mapred.system.dir set to in mapred-site.xml? By default it will write to /tmp on hdfs. So you can do the following create /tmp on hdfs and chmod it to 777 as user hdfs and then restart jobtracker and tasktrackers. In case its set to /mapred/something then create /mapred and chown

RE: Permission problem

2013-04-30 Thread Kevin Burton
Thank you. mapred.system.dir is not set. I am guessing that it is whatever the default is. What should I set it to? /tmp is already 777 kevin@devUbuntu05:~$ hadoop fs -ls /tmp Found 1 items drwxr-xr-x - hdfs supergroup 0 2013-04-29 15:45 /tmp/mapred

Re: Permission problem

2013-04-30 Thread Arpit Gupta
Based on the logs your system dir is set to hdfs://devubuntu05:9000/data/hadoop/tmp/hadoop-mapred/mapred/system what is your fs.default.name and hadoop.tmp.dir in core-site.xml set to? -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Apr 30, 2013, at 7:39 AM, Kevin Burton

RE: Permission problem

2013-04-30 Thread Kevin Burton
In core-site.xml I have: property namefs.default.name/name valuehdfs://devubuntu05:9000/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. /description /property In hdfs-site.xml I have property

Unsubscribe

2013-04-30 Thread Jensen, Daniel
Unsubscribe

Re: Permission problem

2013-04-30 Thread Arpit Gupta
ah this is what mapred.sytem.dir defaults to property namemapred.system.dir/name value${hadoop.tmp.dir}/mapred/system/value descriptionThe directory where MapReduce stores control files. /description /property So thats why its trying to write to

RE: Permission problem

2013-04-30 Thread Kevin Burton
I am not clear on what you are suggesting to create on HDFS or the local file system. As I understand it hadoop.tmp.dir is the local file system. I changed it so that the temporary files would be on a disk that has more capacity then /tmp. So you are suggesting that I create /data/hadoop/tmp on

RE: Permission problem

2013-04-30 Thread Kevin Burton
I am not sure how to create a jira. Again I am not sure I understand your workaround. You are suggesting that I create /data/hadoop/tmp on HDFS like: sudo -u hdfs hadoop fs -mkdir /data/hadoop/tmp I don't think I can chmod -R 777 on /data since it is a disk and as I indicated it is

Re: Permission problem

2013-04-30 Thread Arpit Gupta
Kevin You will have create a new account if you did not have one before. -- Arpit On Apr 30, 2013, at 9:11 AM, Kevin Burton rkevinbur...@charter.net wrote: I don’t see a “create issue” button or tab. If I need to log in then I am not sure what credentials I should use to log in because all I

Can't initialize cluster

2013-04-30 Thread Kevin Burton
I have a simple MapReduce job that I am trying to get to run on my cluster. When I run it I get: 13/04/30 11:27:45 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid mapreduce.jobtracker.address configuration value for

RE: Can't initialize cluster

2013-04-30 Thread Kevin Burton
To be clear when this code is run with 'java -jar' it runs without exception. The exception occurs when I run with 'hadoop jar'. From: Kevin Burton [mailto:rkevinbur...@charter.net] Sent: Tuesday, April 30, 2013 11:36 AM To: user@hadoop.apache.org Subject: Can't initialize cluster I have

Re: Permission problem

2013-04-30 Thread Mohammad Tariq
Sorry Kevin, I was away for a while. Are you good now? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Apr 30, 2013 at 9:50 PM, Arpit Gupta ar...@hortonworks.com wrote: Kevin You will have create a new account if you did not have one before. -- Arpit On Apr

[no subject]

2013-04-30 Thread Niketh Nikky

[no subject]

2013-04-30 Thread Sandeep Nemuri
-- Regards N.H Sandeep

Hadoop

2013-04-30 Thread Manoj Sah
Hai Hadoop, -- * With Best Regards Manoj Kumar Sahu Ameerpet, Hyderabad-500016. 8374232928 /7842496524 * Pl. *Save a tree. Please don't print this e-mail unless you really need to...*

hadoop

2013-04-30 Thread Aditya exalter

Re: Can't initialize cluster

2013-04-30 Thread Mohammad Tariq
Set HADOOP_MAPRED_HOME in your hadoop-env.sh file and re-run the job. See if it helps. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Apr 30, 2013 at 10:10 PM, Kevin Burton rkevinbur...@charter.netwrote: To be clear when this code is run with ‘java –jar’ it runs

RE: Can't initialize cluster

2013-04-30 Thread Kevin Burton
We/I are/am making progress. Now I get the error: 13/04/30 12:59:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/04/30 12:59:40 INFO mapred.JobClient: Cleaning up the staging area

partition as block?

2013-04-30 Thread Jay Vyas
Hi guys: Im wondering - if I'm running mapreduce jobs on a cluster with large block sizes - can i increase performance with either: 1) A custom FileInputFormat 2) A custom partitioner 3) -DnumReducers Clearly, (3) will be an issue due to the fact that it might overload tasks and network

Re: partition as block?

2013-04-30 Thread Jay Vyas
Well, to be more clear, I'm wondering how hadoop-mapreduce can be optimized in a block-less filesystem... And am thinking about application tier ways to simulate blocks - i.e. by making the granularity of partitions smaller. Wondering, if there is a way to hack an increased numbers of partitions

Re: partition as block?

2013-04-30 Thread Jay Vyas
Yes it is a problem at the first stage. What I'm wondering, though, is wether the intermediate results - which happen after the mapper phase - can be optimized. On Tue, Apr 30, 2013 at 3:38 PM, Mohammad Tariq donta...@gmail.com wrote: Hmmm. I was actually thinking about the very first step.

Re: partition as block?

2013-04-30 Thread Mohammad Tariq
Increasing the size can help us to an extent, but increasing it further might cause problems during copy and shuffle. If the partitions are too big to be held in the memory, we'll end up with *disk based shuffle* which is gonna be slower than *RAM based shuffle,* thus delaying the entire reduce

New to Hadoop-SSH communication

2013-04-30 Thread Automation Me
Hello, I am new to Hadoop and trying to install multinode cluster on ubuntu VM's.I am not able to communicate between two clusters using SSH. My host file: 127.0.1.1 Master 127.0.1.2 Slave The following changes i made in two VM's 1.Updated the etc/hosts file in two vm's on Master VM i did

Re: New to Hadoop-SSH communication

2013-04-30 Thread Mohammad Tariq
ssh is actually *user@some_machine *to *user@some_other_machine*. either use same username on both the machines or add the IPs along with proper user@hostname in /etc/hosts file. HTH Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, May 1, 2013 at 2:39 AM, Automation

Re: New to Hadoop-SSH communication

2013-04-30 Thread Automation Me
Thank you Tariq. I am using the same username on both the machines and when i try to copy a file master to slave just to make sure SSH is working fine, The file is copying into master itself not an slave machine. scp -r /usr/local/somefile hduser@slave:/usr/local/somefile Any suggestions...

Re: New to Hadoop-SSH communication

2013-04-30 Thread Mohammad Tariq
show me your /etc/hosts file along with the output of users and hostname on both the machines. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, May 1, 2013 at 3:00 AM, Automation Me anautom...@gmail.com wrote: Thank you Tariq. I am using the same username on both

Re: New to Hadoop-SSH communication

2013-04-30 Thread Mitra Kaseebhotla
Looks like you have just cloned/copied the same VMs. Change the hostname of each: http://askubuntu.com/questions/87665/how-do-i-change-the-hostname-without-a-restart On Tue, Apr 30, 2013 at 2:30 PM, Automation Me anautom...@gmail.com wrote: Thank you Tariq. I am using the same username on

Re: New to Hadoop-SSH communication

2013-04-30 Thread Automation Me
@Mitra Yes I cloned the same VM's. By default Ubuntu takes 127.0.0.1 -ubuntu hostname for all machines @Tariq i will send the hosts file and users of all the machines. On Tue, Apr 30, 2013 at 5:42 PM, Mitra Kaseebhotla mitra.kaseebho...@gmail.com wrote: Looks like you have just cloned/copied

Re: New to Hadoop-SSH communication

2013-04-30 Thread Automation Me
Hi Tariq, Master: Users: hduser hduser hostname: ubuntu *etc/hosts* 127.0.0.1 localhost 127.0.1.1 ubuntu # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters

Re: New to Hadoop-SSH communication

2013-04-30 Thread Mohammad Tariq
comment out 127.0.1.1 ubuntu in both the machines. if it still doesn't work change 127.0.1.1master to something else, like 127.0.0.3 or something. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, May 1, 2013 at 3:34 AM, Automation Me anautom...@gmail.com wrote:

Re: New to Hadoop-SSH communication

2013-04-30 Thread Mitra Kaseebhotla
and change the hostname to reflect your actual hostnames. On Tue, Apr 30, 2013 at 3:14 PM, Mohammad Tariq donta...@gmail.com wrote: comment out 127.0.1.1 ubuntu in both the machines. if it still doesn't work change 127.0.1.1master to something else, like 127.0.0.3 or something. Warm

Re: New to Hadoop-SSH communication

2013-04-30 Thread Automation Me
Thank you Tariq. I will try that... On Tue, Apr 30, 2013 at 6:14 PM, Mohammad Tariq donta...@gmail.com wrote: comment out 127.0.1.1 ubuntu in both the machines. if it still doesn't work change 127.0.1.1master to something else, like 127.0.0.3 or something. Warm Regards, Tariq

Re: New to Hadoop-SSH communication

2013-04-30 Thread Automation Me
Thank you Mitra..I will change the hostname On Tue, Apr 30, 2013 at 6:16 PM, Mitra Kaseebhotla mitra.kaseebho...@gmail.com wrote: and change the hostname to reflect your actual hostnames. On Tue, Apr 30, 2013 at 3:14 PM, Mohammad Tariq donta...@gmail.comwrote: comment out 127.0.1.1

Re: partition as block?

2013-04-30 Thread Jay Vyas
What do you mean increasing the size? Im talking more about increasing the number of partitions... Which actually decreases individual file size. On Apr 30, 2013, at 4:09 PM, Mohammad Tariq donta...@gmail.com wrote: Increasing the size can help us to an extent, but increasing it further might

Re: Hadoop Avro Question

2013-04-30 Thread Harsh J
Oops, moving for sure this time :) On Wed, May 1, 2013 at 10:35 AM, Harsh J ha...@cloudera.com wrote: Moving the question to Apache Avro's user@ lists. Please use the right lists for the most relevant answers. Avro is a different serialization technique that intends to replace the Writable