Re: Very strange Java Collection behavior in Hadoop

2012-03-20 Thread madhu phatak
Hi Owen O'Malley, Thank you for that Instant reply. It's working now. Can you explain me what you mean by input to reducer is reused in little detail? On Tue, Mar 20, 2012 at 11:28 AM, Owen O'Malley omal...@apache.org wrote: On Mon, Mar 19, 2012 at 10:52 PM, madhu phatak phatak@gmail.com

Re: Very strange Java Collection behavior in Hadoop

2012-03-20 Thread Owen O'Malley
On Mon, Mar 19, 2012 at 11:05 PM, madhu phatak phatak@gmail.com wrote: Hi Owen O'Malley, Thank you for that Instant reply. It's working now. Can you explain me what you mean by input to reducer is reused in little detail? Each time the statement Text value = values.next(); is executed

Re: Problem setting super user

2012-03-20 Thread Olivier Sallou
Le 3/19/12 6:39 PM, Mathias Herberts a écrit : does it work under user hdfs? Yes, using user hdfs command is fine. On Mar 19, 2012 6:32 PM, Olivier Sallou olivier.sal...@irisa.fr wrote: Hi, I have installed Hadoop 1.0 using .deb package. I tried to configure superuser groups but it

Re: Problem setting super user

2012-03-20 Thread Olivier Sallou
Le 3/19/12 8:00 PM, Harsh J a écrit : The right property for your version of Hadoop is dfs.permissions.supergroup. Change the property name, restart NN, and your 'root' user should behave as a superuser afterwards. It works thanks! Wiki is however wrong, I gonna create a bug to fix the

Increasing number of Reducers

2012-03-20 Thread Masoud
Hi all, we have a cluster with 32 machines and running C# version of wordcount program on it. Map phase is done by different machines but Reduce is only done by one machine. Our data is around 7G text data and by using one machine for Reduce phase this job is doing so slowly. Is there any

Re: Increasing number of Reducers

2012-03-20 Thread bejoy . hadoop
Hi Mausoud Set -D mapred.reduce.tasks=n; ie to any higher value. Sent from BlackBerry® on Airtel -Original Message- From: Masoud mas...@agape.hanyang.ac.kr Date: Tue, 20 Mar 2012 17:52:58 To: common-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Subject: Increasing

MR job launching is slower

2012-03-20 Thread praveenesh kumar
I have 10 node cluster ( around 24 CPUs, 48 GB RAM, 1 TB HDD, 10 GB ethernet connection) After triggering any MR job, its taking like 3-5 seconds to launch ( I mean the time when I can see any MR job completion % on the screen). I know internally its trying to launch the job,intialize mappers,

Re: Increasing number of Reducers

2012-03-20 Thread Masoud
Thanks for reply, as you know in this way we will have n final result too, is this any way to increase the number of Reducer for fast computation but have only one final result? B.S Masoud On 03/20/2012 07:02 PM, bejoy.had...@gmail.com wrote: Hi Mausoud Set -D mapred.reduce.tasks=n; ie to

Re: MR job launching is slower

2012-03-20 Thread Michael Segel
Hi, First, it sounds like you have 2 6 core CPUs giving you 12 cores not 24. Even though the OS reports 24 cores that's the hyper threading and not real cores. This becomes an issue with respect to tuning. To answer your question ... You have a single 1TB HD. That's going to be a major

Re: Increasing number of Reducers

2012-03-20 Thread bejoy . hadoop
Hi Masoud One reducer would definitely emit one output file. If you are looking at just one file as your final result in lfs, Then once you have the MR job done use hadoop fs -getmerge . Sent from BlackBerry® on Airtel -Original Message- From: Masoud mas...@agape.hanyang.ac.kr

Re: how to implements the 'diff' cmd in hadoop

2012-03-20 Thread botma lin
Thanks Bejoy, that makes sense . If I want to know the different record's original file, I need to put an extra file id into the mapper's output value, then get it in the reducer . Do you have any other ideas Thanks!. On Tue, Mar 20, 2012 at 6:09 PM,Bejoy Ks

Re: how to implements the 'diff' cmd in hadoop

2012-03-20 Thread Bejoy Ks
Yes, if you are having more than 2 files to be compared against then, the file name/ id is required from mapper. If it is just two files and you just want to know which lines are not unique then just the line no would be good but if you are looking at more granular info like the exact changes in

Re: how to implements the 'diff' cmd in hadoop

2012-03-20 Thread botma lin
Thanks a lot! On Tue, Mar 20, 2012 at 7:13,Bejoy Ks bejoy.had...@gmail.com wrote: Yes, if you are having more than 2 files to be compared against then, the file name/ id is required from mapper. If it is just two files and you just want to know which lines are not unique then just the line

Re: is implementing WritableComparable and setting Job.setSortComparatorClass(...) redundant?

2012-03-20 Thread Jane Wayne
thanks chris! On Tue, Mar 20, 2012 at 6:30 AM, Chris White chriswhite...@gmail.comwrote: Setting sortComparatorClass will allow you to configure a RawComparator implementation (allowing you to do more efficient comparisons at the byte level). If you don't set it then hadoop uses the

Re: Reduce copy speed too slow

2012-03-20 Thread Marcos Ortiz
Hi, Gayatri On 03/20/2012 11:59 AM, Gayatri Rao wrote: Hi all, I am running a map reduce job in EC2 instances and it seems to be very slow. It takes hours together for simple projection and aggregation of data. What filesystem are you using for data storage: HDFS in EC2 or Amazon S3? Which

DistributedCache. addFileToClassPath non-jars

2012-03-20 Thread Nabib El-Rahman
Hi All, We are using DistributedCache.addFileToClassPath to have jars as well as a property file available in our classpath. For some reason, the property file cannot be found in our classpath, but the jars are found. Is there something specific to the implementation of addFileToClassPath that

Re: rack awareness and safemode

2012-03-20 Thread John Meagher
Unless something has changed recently it won't automatically relocate the blocks. When I did something similar I had a script that walked through the whole set of files that were misreplicated and increased the replication factor then dropped it back down. This triggered relocation of blocks to

Re: rack awareness and safemode

2012-03-20 Thread Patai Sangbutsarakum
Thanks for your reply and script. Hopefully it still apply to 0.20.203 As far as I play with test cluster. The balancer would take care of replica placement. I just don't want to fall into the situation that the hdfs sit in the safemode for hours and users can't use hadoop and start yelping.

Re: rack awareness and safemode

2012-03-20 Thread Harsh J
John has already addressed your concern. I'd only like to add that fixing of replication violations does not require your NN to be in safe mode and it won't be. Your worry can hence be voided :) On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum patai.sangbutsara...@turn.com wrote: Thanks for

Re: rack awareness and safemode

2012-03-20 Thread Patai Sangbutsarakum
Thanks you all. On Tue, Mar 20, 2012 at 2:44 PM, Harsh J ha...@cloudera.com wrote: John has already addressed your concern. I'd only like to add that fixing of replication violations does not require your NN to be in safe mode and it won't be. Your worry can hence be voided :) On Wed, Mar

Fair Share Scheduler not worked as expected

2012-03-20 Thread WangRamon
Hi All I noticed there is something strange in my Fair Share Scheduler monitor GUI, the SUMof the Faire Share Value is always about 30 even there is only one M/R Job is running, so I don't know whether the value is about the usage percentage, if it was the percentage, that explains why all

Re: how to implements the 'diff' cmd in hadoop

2012-03-20 Thread botma lin
You are right, Dieter. The linux diff regards a file as a list, but I only want to treat it as a set. Sorry I did't make it clear at begining . On Tue, Mar 20, 2012 at 7:33 PM,Dieter Plaetinck die...@plaetinck.be wrote: the diff command on linux (i.e. gnu diffutils) is way more involved than

Re: Eclipse plugin

2012-03-20 Thread raviprakashu
Hi, For the first time when I added a new hadoop location, I could not see the hadoop.job.ugi parameter in the advanced tab. But still I continued to add the location and the location appeared in the Map/Reduce locations panel. I could see DFS Locations tree in the project explorer. When I