Re: why does my mapper class reads my input file twice?
Its your use of the mapred.input.dir property, which is a reserved name in the framework (its what FileInputFormat uses). You have a config you extract path from: Path input = new Path(conf.get("mapred.input.dir")); Then you do: FileInputFormat.addInputPath(job, input); Which internally, simply appends a path to a config prop called "mapred.input.dir". Hence your job gets launched with two input files (the very same) - one added by default Tool-provided configuration (cause of your -Dmapred.input.dir) and the other added by you. Fix the input path line to use a different config: Path input = new Path(conf.get("input.path")); And run job as: hadoop jar dummy-0.1.jar dummy.MyJob -Dinput.path=data/dummy.txt -Dmapred.output.dir=result On Tue, Mar 6, 2012 at 9:03 AM, Jane Wayne wrote: > i have code that reads in a text file. i notice that each line in the text > file is somehow being read twice. why is this happening? > > my mapper class looks like the following: > > public class MyMapper extends Mapper Text> { > > private static final Log _log = LogFactory.getLog(MyMapper.class); > @Override > public void map(LongWritable key, Text value, Context context) throws > IOException, InterruptedException { > String s = (new > StringBuilder()).append(value.toString()).append("m").toString(); > context.write(key, new Text(s)); > _log.debug(key.toString() + " => " + s); > } > } > > my reducer class looks like the following: > > public class MyReducer extends Reducer Text> { > > private static final Log _log = LogFactory.getLog(MyReducer.class); > @Override > public void reduce(LongWritable key, Iterable values, Context > context) throws IOException, InterruptedException { > for(Iterator it = values.iterator(); it.hasNext();) { > Text txt = it.next(); > String s = (new > StringBuilder()).append(txt.toString()).append("r").toString(); > context.write(key, new Text(s)); > _log.debug(key.toString() + " => " + s); > } > } > } > > my job class looks like the following: > > public class MyJob extends Configured implements Tool { > > public static void main(String[] args) throws Exception { > ToolRunner.run(new Configuration(), new MyJob(), args); > } > > @Override > public int run(String[] args) throws Exception { > Configuration conf = getConf(); > Path input = new Path(conf.get("mapred.input.dir")); > Path output = new Path(conf.get("mapred.output.dir")); > > Job job = new Job(conf, "dummy job"); > job.setMapOutputKeyClass(LongWritable.class); > job.setMapOutputValueClass(Text.class); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > > job.setMapperClass(MyMapper.class); > job.setReducerClass(MyReducer.class); > > FileInputFormat.addInputPath(job, input); > FileOutputFormat.setOutputPath(job, output); > > job.setJarByClass(MyJob.class); > > return job.waitForCompletion(true) ? 0 : 1; > } > } > > the text file that i am trying to read in looks like the following. as you > can see, there are 9 lines. > > T, T > T, T > T, T > F, F > F, F > F, F > F, F > T, F > F, T > > the output file that i get after my Job runs looks like the following. as > you can see, there are 18 lines. each key is emitted twice from the mapper > to the reducer. > > 0 T, Tmr > 0 T, Tmr > 6 T, Tmr > 6 T, Tmr > 12 T, Tmr > 12 T, Tmr > 18 F, Fmr > 18 F, Fmr > 24 F, Fmr > 24 F, Fmr > 30 F, Fmr > 30 F, Fmr > 36 F, Fmr > 36 F, Fmr > 42 T, Fmr > 42 T, Fmr > 48 F, Tmr > 48 F, Tmr > > the way i execute my Job is as follows (cygwin + hadoop 0.20.2). > > hadoop jar dummy-0.1.jar dummy.MyJob -Dmapred.input.dir=data/dummy.txt > -Dmapred.output.dir=result > > originally, this happened when i read in a sequence file, but even for a > text file, this problem is still happening. is it the way i have setup my > Job? -- Harsh J
hadoop 1.0 / HOD or CloneZilla?
Hi all, I have experience with hadoop 0.20.204 on 3 machines cluster as pilot, now im trying to setup real cluster on 32 linux machines. I have some question: 1. is hadoop 1.0 stable?? in hadoop site this version is indicated as beta release 2. as you know installing and setting up hadoop in all 32 machines separately in not good idea, so what can i do? 1. using hadoop on demand (HOD)? 2. or using OS image replicate tools same as clozeZilla? i think this method is better because in addition to hadoop I can clone same other settings such as SSH or Samba in all machines. Let me know your idea, B.S, Masoud.
Re: Java Heap space error
Sorry for multiple emails. I did find: 2012-03-05 17:26:35,636 INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call- Usage threshold init = 715849728(699072K) used = 575921696(562423K) committed = 715849728(699072K) max = 715849728(699072K) 2012-03-05 17:26:35,719 INFO org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate of 7816154 bytes from 1 objects. init = 715849728(699072K) used = 575921696(562423K) committed = 715849728(699072K) max = 715849728(699072K) 2012-03-05 17:26:36,881 INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call - Collection threshold init = 715849728(699072K) used = 358720384(350312K) committed = 715849728(699072K) max = 715849728(699072K) 2012-03-05 17:26:36,885 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2012-03-05 17:26:36,888 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space at java.nio.HeapCharBuffer.(HeapCharBuffer.java:39) at java.nio.CharBuffer.allocate(CharBuffer.java:312) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:760) at org.apache.hadoop.io.Text.decode(Text.java:350) at org.apache.hadoop.io.Text.decode(Text.java:327) at org.apache.hadoop.io.Text.toString(Text.java:254) at org.apache.pig.piggybank.storage.SequenceFileLoader.translateWritableToPigDataType(SequenceFileLoader.java:105) at org.apache.pig.piggybank.storage.SequenceFileLoader.getNext(SequenceFileLoader.java:139) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) On Mon, Mar 5, 2012 at 5:46 PM, Mohit Anchlia wrote: > All I see in the logs is: > > > 2012-03-05 17:26:36,889 FATAL org.apache.hadoop.mapred.TaskTracker: Task: > attempt_201203051722_0001_m_30_1 - Killed : Java heap space > > Looks like task tracker is killing the tasks. Not sure why. I increased > heap from 512 to 1G and still it fails. > > > On Mon, Mar 5, 2012 at 5:03 PM, Mohit Anchlia wrote: > >> I currently have java.opts.mapred set to 512MB and I am getting heap >> space errors. How should I go about debugging heap space issues? >> > >
Re: Java Heap space error
All I see in the logs is: 2012-03-05 17:26:36,889 FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201203051722_0001_m_30_1 - Killed : Java heap space Looks like task tracker is killing the tasks. Not sure why. I increased heap from 512 to 1G and still it fails. On Mon, Mar 5, 2012 at 5:03 PM, Mohit Anchlia wrote: > I currently have java.opts.mapred set to 512MB and I am getting heap space > errors. How should I go about debugging heap space issues? >
Re: OutOfMemoryError: unable to create new native thread
Hi Rohini, The similar problem was just encountered for me yesterday. But for my situation, the max process num (ulimit -u) is set to 1024, which is too small. And when i increase it to 100, the problem gone. But u said "Ulimit on the machine is set to unlimited", i'm not sure this will help or not :) And also check about `cat /proc/sys/kernel/threads-max', this seems to be a system-wide setting for total number of threads. On Tue, Mar 6, 2012 at 4:30 AM, Rohini U wrote: > Hi All, > > I am running a map reduce job that uses around 120 MB of data and I get > this out of memory error. Ulimit on the machine is set to unlimited. Any > ideas on how to fix this? > The stack trace is as given below: > > > Exception in thread "main" org.apache.hadoop.ipc.RemoteException: > java.io.IOException: java.lang.OutOfMemoryError: unable to create new > native thread >at java.lang.Thread.start0(Native Method) >at java.lang.Thread.start(Thread.java:597) >at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:553) >at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvmRunner(JvmManager.java:317) >at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvm(JvmManager.java:297) >at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType.taskKilled(JvmManager.java:289) >at > org.apache.hadoop.mapred.JvmManager.taskKilled(JvmManager.java:158) >at org.apache.hadoop.mapred.TaskRunner.kill(TaskRunner.java:782) >at > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2938) >at > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:2910) >at > org.apache.hadoop.mapred.TaskTracker.purgeTask(TaskTracker.java:1974) >at > org.apache.hadoop.mapred.TaskTracker.fatalError(TaskTracker.java:3327) >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >at java.lang.reflect.Method.invoke(Method.java:597) >at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) >at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434) >at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430) >at java.security.AccessController.doPrivileged(Native Method) >at javax.security.auth.Subject.doAs(Subject.java:396) >at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) >at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428) > >at org.apache.hadoop.ipc.Client.call(Client.java:1107) >at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) >at $Proxy0.fatalError(Unknown Source) >at org.apache.hadoop.mapred.Child.main(Child.java:325) > > > > Thanks > -Rohini > -- Kindest Regards, Clay Chiang
Re: Custom Seq File Loader: ClassNotFoundException
Unfortunately, "public" didn't change my error ... Any other ideas? Has anyone ran Hadoop on eclipse with custom sequence inputs ? Thank you, Mark On Mon, Mar 5, 2012 at 9:58 AM, Mark question wrote: > Hi Madhu, it has the following line: > > TermDocFreqArrayWritable () {} > > but I'll try it with "public" access in case it's been called outside of > my package. > > Thank you, > Mark > > > On Sun, Mar 4, 2012 at 9:55 PM, madhu phatak wrote: > >> Hi, >> Please make sure that your CustomWritable has a default constructor. >> >> On Sat, Mar 3, 2012 at 4:56 AM, Mark question >> wrote: >> >> > Hello, >> > >> > I'm trying to debug my code through eclipse, which worked fine with >> > given Hadoop applications (eg. wordcount), but as soon as I run it on my >> > application with my custom sequence input file/types, I get: >> > Java.lang.runtimeException.java.ioException (Writable name can't load >> > class) >> > SequenceFile$Reader.getValeClass(Sequence File.class) >> > >> > because my valueClass is customed. In other words, how can I add/build >> my >> > CustomWritable class to be with hadoop LongWritable,IntegerWritable >> > etc. >> > >> > Did anyone used eclipse? >> > >> > Mark >> > >> >> >> >> -- >> Join me at http://hadoopworkshop.eventbrite.com/ >> > >
Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R
Streaming is good for simulation. Long running map-only processes, where pig doesn't really help and it is simple to fire off a streaming process. You do have to set some options so they can take a long time to return/return counters. Russell Jurney http://datasyndrome.com On Mar 5, 2012, at 12:38 PM, Eli Finkelshteyn wrote: > I'm really interested in this as well. I have trouble seeing a really good > use case for streaming map-reduce. Is there something I can do in streaming > that I can't do in Pig? If I want to re-use previously made Python functions > from my code base, I can do that in Pig as much as Streaming, and from what > I've experienced thus far, Python streaming seems to go slower than or at the > same speed as Pig, so why would I want to write a whole lot of > more-difficult-to-read mappers and reducers when I can do equally fast > performance-wise, shorter, and clearer code in Pig? Maybe it's obvious, but > currently I just can't think of the right use case. > > Eli > > On 3/2/12 9:21 AM, Subir S wrote: >> On Fri, Mar 2, 2012 at 12:38 PM, Harsh J wrote: >> >>> On Fri, Mar 2, 2012 at 10:18 AM, Subir S >>> wrote: Hello Folks, Are there any pointers to such comparisons between Apache Pig and Hadoop Streaming Map Reduce jobs? >>> I do not see why you seek to compare these two. Pig offers a language >>> that lets you write data-flow operations and runs these statements as >>> a series of MR jobs for you automatically (Making it a great tool to >>> use to get data processing done really quick, without bothering with >>> code), while streaming is something you use to write non-Java, simple >>> MR jobs. Both have their own purposes. >>> >> Basically we are comparing these two to see the benefits and how much they >> help in improving the productive coding time, without jeopardizing the >> performance of MR jobs. >> >> Also there was a claim in our company that Pig performs better than Map Reduce jobs? Is this true? Are there any such benchmarks available >>> Pig _runs_ MR jobs. It does do job design (and some data) >>> optimizations based on your queries, which is what may give it an edge >>> over designing elaborate flows of plain MR jobs with tools like >>> Oozie/JobControl (Which takes more time to do). But regardless, Pig >>> only makes it easy doing the same thing with Pig Latin statements for >>> you. >>> >> I knew that Pig runs MR jobs, as Hive runs MR jobs. But Hive jobs become >> pretty slow with lot of joins, which we can achieve faster with writing raw >> MR jobs. So with that context was trying to see how Pig runs MR jobs. Like >> for example what kind of projects should consider Pig. Say when we have a >> lot of Joins, which writing with plain MR jobs takes time. Thoughts? >> >> Thank you Harsh for your comments. They are helpful! >> >> >>> -- >>> Harsh J >>> >
Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R
I'm really interested in this as well. I have trouble seeing a really good use case for streaming map-reduce. Is there something I can do in streaming that I can't do in Pig? If I want to re-use previously made Python functions from my code base, I can do that in Pig as much as Streaming, and from what I've experienced thus far, Python streaming seems to go slower than or at the same speed as Pig, so why would I want to write a whole lot of more-difficult-to-read mappers and reducers when I can do equally fast performance-wise, shorter, and clearer code in Pig? Maybe it's obvious, but currently I just can't think of the right use case. Eli On 3/2/12 9:21 AM, Subir S wrote: On Fri, Mar 2, 2012 at 12:38 PM, Harsh J wrote: On Fri, Mar 2, 2012 at 10:18 AM, Subir S wrote: Hello Folks, Are there any pointers to such comparisons between Apache Pig and Hadoop Streaming Map Reduce jobs? I do not see why you seek to compare these two. Pig offers a language that lets you write data-flow operations and runs these statements as a series of MR jobs for you automatically (Making it a great tool to use to get data processing done really quick, without bothering with code), while streaming is something you use to write non-Java, simple MR jobs. Both have their own purposes. Basically we are comparing these two to see the benefits and how much they help in improving the productive coding time, without jeopardizing the performance of MR jobs. Also there was a claim in our company that Pig performs better than Map Reduce jobs? Is this true? Are there any such benchmarks available Pig _runs_ MR jobs. It does do job design (and some data) optimizations based on your queries, which is what may give it an edge over designing elaborate flows of plain MR jobs with tools like Oozie/JobControl (Which takes more time to do). But regardless, Pig only makes it easy doing the same thing with Pig Latin statements for you. I knew that Pig runs MR jobs, as Hive runs MR jobs. But Hive jobs become pretty slow with lot of joins, which we can achieve faster with writing raw MR jobs. So with that context was trying to see how Pig runs MR jobs. Like for example what kind of projects should consider Pig. Say when we have a lot of Joins, which writing with plain MR jobs takes time. Thoughts? Thank you Harsh for your comments. They are helpful! -- Harsh J
Re: Custom Seq File Loader: ClassNotFoundException
Hi Madhu, it has the following line: TermDocFreqArrayWritable () {} but I'll try it with "public" access in case it's been called outside of my package. Thank you, Mark On Sun, Mar 4, 2012 at 9:55 PM, madhu phatak wrote: > Hi, > Please make sure that your CustomWritable has a default constructor. > > On Sat, Mar 3, 2012 at 4:56 AM, Mark question wrote: > > > Hello, > > > > I'm trying to debug my code through eclipse, which worked fine with > > given Hadoop applications (eg. wordcount), but as soon as I run it on my > > application with my custom sequence input file/types, I get: > > Java.lang.runtimeException.java.ioException (Writable name can't load > > class) > > SequenceFile$Reader.getValeClass(Sequence File.class) > > > > because my valueClass is customed. In other words, how can I add/build my > > CustomWritable class to be with hadoop LongWritable,IntegerWritable > > etc. > > > > Did anyone used eclipse? > > > > Mark > > > > > > -- > Join me at http://hadoopworkshop.eventbrite.com/ >
Re: AWS MapReduce
On Mon, Mar 5, 2012 at 7:40 AM, John Conwell wrote: > AWS MapReduce (EMR) does not use S3 for its HDFS persistance. If it did > your S3 billing would be massive :) EMR reads all input jar files and > input data from S3, but it copies these files down to its local disk. It > then does starts the MR process, doing all HDFS reads and writes to the > local disks. At the end of the MR job, it copies the MR job output and all > process logs to S3, and then tears down the VM instances. > > You can see this for yourself if you spin up a small EMR cluster, but turn > off the configuration flag that kills the VMs at the end if the MR job. > Then look at the hadoop configuration files to see how hadoop is > configured. > > I really like EMR. Amazon has done a lot of work to optimize the hadoop > configurations and VM instance AMIs to execute MR jobs fairly efficiently > on a VM cluster. I had to do a lot of (expensive) trial and error work to > figure out an optimal hadoop / VM configuration to run our MR jobs without > crashing / timing out the jobs. The only reason we didnt standardize on > EMR was that it strongly bound your code base / process to using EMR for > hadoop processing, vs a flexible infrastructure that could use a local > cluster or cluster on a different cloud provider. > > Thanks for your input. I am assuming HDFS is created on ephemerial disks and not EBS. Also, is it possible to share some of your findings? > > On Sun, Mar 4, 2012 at 8:51 AM, Mohit Anchlia >wrote: > > > As far as I see in the docs it looks like you could also use hdfs instead > > of s3. But what I am not sure is if these are local disks or EBS. > > > > On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer < > > hannesc...@googlemail.com > > > wrote: > > > > > Hi, > > > > > > yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow. > > > The setup is done pretty fast and there are some configuration > parameters > > > you can bypass - for example blocksizes etc. - but in the end imho > > setting > > > up ec2 instances by copying images is the better alternative. > > > > > > Kind Regards > > > > > > Hannes > > > > > > On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia > > >wrote: > > > > > > > I think found answer to this question. However, it's still not clear > if > > > > HDFS is on local disk or EBS volumes. Does anyone know? > > > > > > > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia < > mohitanch...@gmail.com > > > > >wrote: > > > > > > > > > Just want to check how many are using AWS mapreduce and understand > > the > > > > > pros and cons of Amazon's MapReduce machines? Is it true that these > > map > > > > > reduce machines are really reading and writing from S3 instead of > > local > > > > > disks? Has anyone found issues with Amazon MapReduce and how does > it > > > > > compare with using MapReduce on local attached disks compared to > > using > > > > S3. > > > > > > > > > > --- > > > www.informera.de > > > Hadoop & Big Data Services > > > > > > > > > -- > > Thanks, > John C >
Re: AWS MapReduce
AWS MapReduce (EMR) does not use S3 for its HDFS persistance. If it did your S3 billing would be massive :) EMR reads all input jar files and input data from S3, but it copies these files down to its local disk. It then does starts the MR process, doing all HDFS reads and writes to the local disks. At the end of the MR job, it copies the MR job output and all process logs to S3, and then tears down the VM instances. You can see this for yourself if you spin up a small EMR cluster, but turn off the configuration flag that kills the VMs at the end if the MR job. Then look at the hadoop configuration files to see how hadoop is configured. I really like EMR. Amazon has done a lot of work to optimize the hadoop configurations and VM instance AMIs to execute MR jobs fairly efficiently on a VM cluster. I had to do a lot of (expensive) trial and error work to figure out an optimal hadoop / VM configuration to run our MR jobs without crashing / timing out the jobs. The only reason we didnt standardize on EMR was that it strongly bound your code base / process to using EMR for hadoop processing, vs a flexible infrastructure that could use a local cluster or cluster on a different cloud provider. On Sun, Mar 4, 2012 at 8:51 AM, Mohit Anchlia wrote: > As far as I see in the docs it looks like you could also use hdfs instead > of s3. But what I am not sure is if these are local disks or EBS. > > On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer < > hannesc...@googlemail.com > > wrote: > > > Hi, > > > > yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow. > > The setup is done pretty fast and there are some configuration parameters > > you can bypass - for example blocksizes etc. - but in the end imho > setting > > up ec2 instances by copying images is the better alternative. > > > > Kind Regards > > > > Hannes > > > > On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia > >wrote: > > > > > I think found answer to this question. However, it's still not clear if > > > HDFS is on local disk or EBS volumes. Does anyone know? > > > > > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia > > >wrote: > > > > > > > Just want to check how many are using AWS mapreduce and understand > the > > > > pros and cons of Amazon's MapReduce machines? Is it true that these > map > > > > reduce machines are really reading and writing from S3 instead of > local > > > > disks? Has anyone found issues with Amazon MapReduce and how does it > > > > compare with using MapReduce on local attached disks compared to > using > > > S3. > > > > > > > --- > > www.informera.de > > Hadoop & Big Data Services > > > -- Thanks, John C
Re: Setting up Hadoop single node setup on Mac OS X
On 02/27/2012 11:53 AM, W.P. McNeill wrote: You don't need any virtualization. Mac OS X is Linux and runs Hadoop as is. Nitpick: OS X is NEXTSTEP based on Mach, which is a different POSIX-compliant system from Linux.
fairscheduler : group.name | Please edit patch to work for 0.20.205
Can someone have a look at the patch MAPREDUCE-2457 and see if it can be modified to work for 0.20.205? I am very new to java and have no idea what's going on in that patch. If you have any pointers for me, I will see if I can do it on my own. Thanks, Austin On Fri, Mar 2, 2012 at 7:15 PM, Austin Chungath wrote: > I tried the patch MAPREDUCE-2457 but it didn't work for my hadoop 0.20.205. > Are you sure this patch will work for 0.20.205? > According to the description it says that the patch works for 0.21 and > 0.22 and it says that 0.20 supports group.name without this patch... > > So does this patch also apply to 0.20.205? > > Thanks, > Austin > > On Thu, Mar 1, 2012 at 11:24 PM, Harsh J wrote: > >> The group.name scheduler support was introduced in >> https://issues.apache.org/jira/browse/HADOOP-3892 but may have been >> broken by the security changes present in 0.20.205. You'll need the >> fix presented in https://issues.apache.org/jira/browse/MAPREDUCE-2457 >> to have group.name support. >> >> On Thu, Mar 1, 2012 at 6:42 PM, Austin Chungath >> wrote: >> > I am running fair scheduler on hadoop 0.20.205.0 >> > >> > http://hadoop.apache.org/common/docs/r0.20.205.0/fair_scheduler.html >> > The above page talks about the following property >> > >> > *mapred.fairscheduler.poolnameproperty* >> > ** >> > which I can set to *group.name* >> > The default is user.name and when a user submits a job the fair >> scheduler >> > assigns each user's job to a pool which has the name of the user. >> > I am trying to change it to group.name so that the job is submitted to >> a >> > pool which has the name of the user's linux group. Thus all jobs from >> any >> > user from a specific group go to the same pool instead of an individual >> > pool for every user. >> > But *group.name* doesn't seem to work, has anyone tried this before? >> > >> > *user.name* and *mapred.job.queue.name* works. Is group.name supported >> in >> > 0.20.205.0 because I don't see it mentioned in the docs? >> > >> > Thanks, >> > Austin >> >> >> >> -- >> Harsh J >> > >