RE: Restricting quota for users in HDFS
Yeah. I meant the same. I want to restrict a directory which is owned by a particular user. Thanks Pallavi -Original Message- From: Allen Wittenauer [mailto:a...@yahoo-inc.com] Sent: Tuesday, June 16, 2009 11:18 PM To: core-user@hadoop.apache.org Subject: Re: Restricting quota for users in HDFS On 6/15/09 11:16 PM, Palleti, Pallavi pallavi.pall...@corp.aol.com wrote: We have chown command in hadoop dfs to make a particular directory own by a person. Do we have something similar to create user with some space limit/restrict the disk usage by a particular user? Quotas are implemented on a per-directory basis, not per-user. There is no support for this user can have X space, regardless of where he/she writes only this directory has a limit of X space, regardless of who writes there.
hadoop-streaming for network simulation
Hi, maybe somebody could help me with this. What I want to do is using Hadoop Streaming for just executing the same program with different parameters. I'm using the network simulatioin software Omnet++ and I want to run this simulation in parallel. Omnet Programs can be executed from a linux shell, the just need an omnetpp.ini file for configuration. So first step for me is, to get omnet running on hadoop with a simple example and the same parameters. So which hadoop-paramters I have to use when I start Hadoop Streaming? (I think hadoop streaming is the right way to do this, or not?) Acutally I try something like this, but the streaming job fails. bin/hadoop jar hadoop-0.18.3-streaming.jar -input /input/fifo -output /output/fifo -mapper /home/simon/omnetpp-4.0/samples/fifo/fifo -u Cmdenv -c Fifo1 -file /home/simon/omnetpp-4.0/samples/fifo/fifo -reducer NONE So the omnet program is just the mapper. - in /input/fifo is the omnetpp.ini file located, that omnet fifo example needs to run the job. fifo -u Cmdenv -c Fifo1 are the parameters to start omnet without the graphic interface. But also when I put these parameters into a shell srcipt and run the script with hadoop I get the same error. The omnet program is installed on all machines. Maybe I have to give the omnet.ini file in some other way to the omnet-fifo program? I don't know. I'm no programmer and I even don't know if this approuch goes into the right direction. For any hints or suggestions I would be very glad. P.S. Sorry for my bad english Simon Lorenz Karlsruhe Germany
Re: Debugging Map-Reduce programs
Hi, You could also use apache commons logging to write logs in your map/reduce functions which will be seen in the jobtracker UI. that's how we did debugging :) Hope it helps Regards, Raakhi On Tue, Jun 16, 2009 at 7:29 PM, jason hadoop jason.had...@gmail.comwrote: When you are running in local mode you have 2 basic choices if you want to interact with a debugger. You can launch from within eclipse or other IDE, or you can setup a java debugger transport as part of the mapred.child.java.opts variable, and attach to the running jvm. By far the simplest is loading via eclipse. Your other alternative is to inform the framework to retain the job files via keep.failed.task.files (be careful here you will fill your disk with old dead data) and use the debug the IsolationRunner Examples in my book :) On Mon, Jun 15, 2009 at 6:49 PM, bharath vissapragada bharathvissapragada1...@gmail.com wrote: I am running in a local mode . Can you tell me how to set those breakpoints or how to access those files so that i can debug the program. The program is generating = java.lang.NumberFormatException: For input string: But that particular string is the one which is the input to the mapclass . So I think that it is not reading my input correctly .. But when i try to print the same .. it isn't printing to the STDOUT .. Iam using the FileInputFormat class FileInputFormat.addInputPath(conf, new Path(/home/rip/Desktop/hadoop-0.18.3/input)); FileOutputFormat.setOutputPath(conf, new Path(/home/rip/Desktop/hadoop-0.18.3/output)); input and output are folders for inp and outpt. It is generating these warnings also 09/06/16 12:38:32 WARN fs.FileSystem: local is a deprecated filesystem name. Use file:/// instead. Thanks in advance On Tue, Jun 16, 2009 at 3:50 AM, Aaron Kimball aa...@cloudera.com wrote: On Mon, Jun 15, 2009 at 10:01 AM, bharath vissapragada bhara...@students.iiit.ac.in wrote: Hi all , When running hadoop in local mode .. can we use print statements to print something to the terminal ... Yes. In distributed mode, each task will write its stdout/stderr to files which you can access through the web-based interface. Also iam not sure whether the program is reading my input files ... If i keep print statements it isn't displaying any .. can anyone tell me how to solve this problem. Is it generating exceptions? Are the files present? If you're running in local mode, you can use a debugger; set a breakpoint in your map() method and see if it gets there. How are you configuring the input files for your job? Thanks in adance, -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Error Recovery for block -Aborting to Write
Hi all, We are facing issues while porting some logs to HDFS. The way we are doing it is using a simple java code which tries to read the file and writes to HDFS using OutputStream. It was working perfectly fine and recently, we are getting below error messages once in a while and when we try to read that data for that data we are getting error message as Could not obtain block and the jobs are failing. Can some one tell me what would be the issue? The error message while writing to HDFS is: WARN dfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-1005367228931083977_10012402java.io.IOException: Bad response 1 for block blk_-1005367228931083977_10012402 from datanode xxx.xxx.xxx.88:50010 at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCli ent.java:2076) 09/06/08 14:26:41 WARN dfs.DFSClient: Error Recovery for block blk_-1005367228931083977_10012402 bad datanode[1] xxx.xxx.xxx.88:50010 09/06/08 14:26:41 WARN dfs.DFSClient: Error Recovery for block blk_-1005367228931083977_10012402 in pipeline xxx.xxx.xxx.79:50010, xxx.xxx.xxx.88:50010, xxx.xxx.xxx.68:50010: bad datanode xxx.xxx.xxx.88:50010 09/06/08 14:26:42 WARN dfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-1005367228931083977_10012456java.io.IOException: Bad response 1 for block blk_-1005367228931083977_10012456 from datanode xxx.xxx.xxx.68:50010 at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCli ent.java:2076) 09/06/08 14:26:42 WARN dfs.DFSClient: Error Recovery for block blk_-1005367228931083977_10012456 bad datanode[1] xxx.xxx.xxx.68:50010 09/06/08 14:26:42 WARN dfs.DFSClient: Error Recovery for block blk_-1005367228931083977_10012456 in pipeline xxx.xxx.xxx.79:50010, xxx.xxx.xxx.68:50010: bad datanode xxx.xxx.xxx.68:50010 09/06/08 14:26:43 WARN dfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-1005367228931083977_10012457java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCli ent.java:2052) 09/06/08 14:26:43 WARN dfs.DFSClient: Error Recovery for block blk_-1005367228931083977_10012457 bad datanode[0] xxx.xxx.xxx.79:50010 IOException - java.io.IOException: All datanodes xxx.xxx.xxx.79:50010 are bad. Aborting... while writing - Thanks Pallavi
not a SequenceFile?
Hi Group, I have trouble running couple of examples provided by Hadoop. Below are the error messages I have from the console, could you please advise what could be the problem and probable solution? 09/06/17 16:30:29 INFO mapred.FileInputFormat: Total input paths to process : 1 09/06/17 16:30:29 INFO mapred.FileInputFormat: Total input paths to process : 1 09/06/17 16:30:29 INFO mapred.JobClient: Running job: job_200906171601_0009 09/06/17 16:30:30 INFO mapred.JobClient: map 0% reduce 0% 09/06/17 16:30:38 INFO mapred.JobClient: Task Id : attempt_200906171601_0009_m_00_0, Status : FAILED java.io.IOException: hdfs://localhost:9000/user/root/words not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1458) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1431) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1420) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1415) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordR eader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFil eInputFormat.java:54) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) 09/06/17 16:30:39 INFO mapred.JobClient: Task Id : attempt_200906171601_0009_r_00_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass at org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor. createInstance(UserDefinedValueAggregatorDescriptor.java:57) at org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor. createAggregator(UserDefinedValueAggregatorDescriptor.java:64) at org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor. init(UserDefinedValueAggregatorDescriptor.java:76) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.getValueAggreg atorDescriptor(ValueAggregatorJobBase.java:54) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.getAggregatorD escriptors(ValueAggregatorJobBase.java:65) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.initializeMySp ec(ValueAggregatorJobBase.java:74) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.configure(Valu eAggregatorJobBase.java:42) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:242) at org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor. createInstance(UserDefinedValueAggregatorDescriptor.java:52) ... 10 more Thank You, Shravan Kumar. M Catalytic Software Ltd. [SEI-CMMI Level 5 Company] - This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system administrator - mailto:netopshelpd...@catalytic.com netopshelpd...@catalytic.com
Re: not a SequenceFile?
I guest you have set SequenceFileFormat as your inputformat in the configuration object, but the file you provide is not a sequence file. 2009/6/17 Shravan Mahankali shravan.mahank...@catalytic.com Hi Group, I have trouble running couple of examples provided by Hadoop. Below are the error messages I have from the console, could you please advise what could be the problem and probable solution? 09/06/17 16:30:29 INFO mapred.FileInputFormat: Total input paths to process : 1 09/06/17 16:30:29 INFO mapred.FileInputFormat: Total input paths to process : 1 09/06/17 16:30:29 INFO mapred.JobClient: Running job: job_200906171601_0009 09/06/17 16:30:30 INFO mapred.JobClient: map 0% reduce 0% 09/06/17 16:30:38 INFO mapred.JobClient: Task Id : attempt_200906171601_0009_m_00_0, Status : FAILED java.io.IOException: hdfs://localhost:9000/user/root/words not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1458) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1431) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1420) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1415) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordR eader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFil eInputFormat.java:54) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) 09/06/17 16:30:39 INFO mapred.JobClient: Task Id : attempt_200906171601_0009_r_00_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass at org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor. createInstance(UserDefinedValueAggregatorDescriptor.java:57) at org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor. createAggregator(UserDefinedValueAggregatorDescriptor.java:64) at org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor. init(UserDefinedValueAggregatorDescriptor.java:76) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.getValueAggreg atorDescriptor(ValueAggregatorJobBase.java:54) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.getAggregatorD escriptors(ValueAggregatorJobBase.java:65) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.initializeMySp ec(ValueAggregatorJobBase.java:74) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorJobBase.configure(Valu eAggregatorJobBase.java:42) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:242) at org.apache.hadoop.mapred.lib.aggregate.UserDefinedValueAggregatorDescriptor. createInstance(UserDefinedValueAggregatorDescriptor.java:52) ... 10 more Thank You, Shravan Kumar. M Catalytic Software Ltd. [SEI-CMMI Level 5 Company] - This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system administrator - mailto:netopshelpd...@catalytic.com netopshelpd...@catalytic.com -- http://daily.appspot.com/food/
RE: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09 summit here is a 50% off coupon corrected code is LUCKYOU
HI Jason, Where can I download your books' Alpha Chapters, I am very interested in your book about hadoop. And I cannot visit the link www.prohadoopbook.com -Original Message- From: jason hadoop [mailto:jason.had...@gmail.com] Sent: 2009年6月9日 20:47 To: core-user@hadoop.apache.org Subject: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09 summit here is a 50% off coupon corrected code is LUCKYOU http://eBookshop.apress.com CODE LUCKYOU -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals
Re: Nor OOM Java Heap Space neither GC OverHead Limit Exeeceded
Thanks Jason. I went inside the code of the statement and found out that it eventually makes some binaryRead function call to read a binary file and there it strucks. Do you know whether there is any problem in giving a binary file for addition to the distributed cache. In the statement DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); Data is a directory which contains some text as well as some binary files. In the statement Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); I can see(in the output messages) that it is able to read the text files but it gets struck at the binary files. So, I think here the problem is: it is not able to read the binary files which either have not been transferred to the cache or a binary file cannot be read. Do you know the solution to this? Thanks, Akhil jason hadoop wrote: Something is happening inside of your (Parameters. readConfigAndLoadExternalData(Config/allLayer1.config);) code, and the framework is killing the job for not heartbeating for 600 seconds On Tue, Jun 16, 2009 at 8:32 PM, akhil1988 akhilan...@gmail.com wrote: One more thing, finally it terminates there (after some time) by giving the final Exception: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217) at LbjTagger.NerTagger.main(NerTagger.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) akhil1988 wrote: Thank you Jason for your reply. My Map class is an inner class and it is a static class. Here is the structure of my code. public class NerTagger { public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, Text{ private Text word = new Text(); private static NETaggerLevel1 tagger1 = new NETaggerLevel1(); private static NETaggerLevel2 tagger2 = new NETaggerLevel2(); Map(){ System.out.println(HI2\n); Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); System.out.println(HI3\n); Parameters.forceNewSentenceOnLineBreaks=Boolean.parseBoolean(true); System.out.println(loading the tagger); tagger1=(NETaggerLevel1)Classifier.binaryRead(Parameters.pathToModelFile+.level1); System.out.println(HI5\n); tagger2=(NETaggerLevel2)Classifier.binaryRead(Parameters.pathToModelFile+.level2); System.out.println(Done- loading the tagger); } public void map(LongWritable key, Text value, OutputCollectorText, Text output, Reporter reporter ) throws IOException { String inputline = value.toString(); /* Processing of the input pair is done here */ } public static void main(String [] args) throws Exception { JobConf conf = new JobConf(NerTagger.class); conf.setJobName(NerTagger); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setNumReduceTasks(0); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); conf.set(mapred.job.tracker, local); conf.set(fs.default.name, file:///); DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Config/), conf); DistributedCache.createSymlink(conf); conf.set(mapred.child.java.opts,-Xmx4096m); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); System.out.println(HI1\n); JobClient.runJob(conf); } Jason, when the program executes HI1 and HI2 are printed but it does not reaches HI3. In the statement Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); it is able to access Config/allLayer1.config file
Re: Nor OOM Java Heap Space neither GC OverHead Limit Exeeceded
I have only ever used the distributed cache to add files, including binary files such as shared libraries. It looks like you are adding a directory. The DistributedCache is not generally used for passing data, but for passing file names. The files must be stored in a shared file system (hdfs for simplicity) already. The distributed cache makes the names available to the tasks, and the the files are extracted from hdfs and stored in the task local work area on each task tracker node. It looks like you may be storing the contents of your files in the distributed cache. On Wed, Jun 17, 2009 at 6:56 AM, akhil1988 akhilan...@gmail.com wrote: Thanks Jason. I went inside the code of the statement and found out that it eventually makes some binaryRead function call to read a binary file and there it strucks. Do you know whether there is any problem in giving a binary file for addition to the distributed cache. In the statement DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); Data is a directory which contains some text as well as some binary files. In the statement Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); I can see(in the output messages) that it is able to read the text files but it gets struck at the binary files. So, I think here the problem is: it is not able to read the binary files which either have not been transferred to the cache or a binary file cannot be read. Do you know the solution to this? Thanks, Akhil jason hadoop wrote: Something is happening inside of your (Parameters. readConfigAndLoadExternalData(Config/allLayer1.config);) code, and the framework is killing the job for not heartbeating for 600 seconds On Tue, Jun 16, 2009 at 8:32 PM, akhil1988 akhilan...@gmail.com wrote: One more thing, finally it terminates there (after some time) by giving the final Exception: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217) at LbjTagger.NerTagger.main(NerTagger.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) akhil1988 wrote: Thank you Jason for your reply. My Map class is an inner class and it is a static class. Here is the structure of my code. public class NerTagger { public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, Text{ private Text word = new Text(); private static NETaggerLevel1 tagger1 = new NETaggerLevel1(); private static NETaggerLevel2 tagger2 = new NETaggerLevel2(); Map(){ System.out.println(HI2\n); Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); System.out.println(HI3\n); Parameters.forceNewSentenceOnLineBreaks=Boolean.parseBoolean(true); System.out.println(loading the tagger); tagger1=(NETaggerLevel1)Classifier.binaryRead(Parameters.pathToModelFile+.level1); System.out.println(HI5\n); tagger2=(NETaggerLevel2)Classifier.binaryRead(Parameters.pathToModelFile+.level2); System.out.println(Done- loading the tagger); } public void map(LongWritable key, Text value, OutputCollectorText, Text output, Reporter reporter ) throws IOException { String inputline = value.toString(); /* Processing of the input pair is done here */ } public static void main(String [] args) throws Exception { JobConf conf = new JobConf(NerTagger.class); conf.setJobName(NerTagger); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setNumReduceTasks(0); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); conf.set(mapred.job.tracker, local); conf.set(fs.default.name, file:///); DistributedCache.addCacheFile(new
Re: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09 summit here is a 50% off coupon corrected code is LUCKYOU
You can purchase the ebook from www.apress.com. The final copy is now available. There is a 50% off coupon good for a few more days, LUCKYOU. you can try prohadoop.ning.com as an alternative for www.prohadoopbook.com, or www.prohadoop.com. What error do you receive when you try to visit www.prohadoopbook.com ? 2009/6/17 zjffdu zjf...@gmail.com HI Jason, Where can I download your books' Alpha Chapters, I am very interested in your book about hadoop. And I cannot visit the link www.prohadoopbook.com -Original Message- From: jason hadoop [mailto:jason.had...@gmail.com] Sent: 2009年6月9日 20:47 To: core-user@hadoop.apache.org Subject: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09 summit here is a 50% off coupon corrected code is LUCKYOU http://eBookshop.apress.com CODE LUCKYOU -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09 summit here is a 50% off coupon corrected code is LUCKYOU
Hi Jason, I still can not visit the links you provide, I am in china maybe some network problem. Could you send me the alpha chapters of your book? That would be appreciated. Thank you Jeff Zhang 2009/6/17 jason hadoop jason.had...@gmail.com You can purchase the ebook from www.apress.com. The final copy is now available. There is a 50% off coupon good for a few more days, LUCKYOU. you can try prohadoop.ning.com as an alternative for www.prohadoopbook.com , or www.prohadoop.com. What error do you receive when you try to visit www.prohadoopbook.com ? 2009/6/17 zjffdu zjf...@gmail.com HI Jason, Where can I download your books' Alpha Chapters, I am very interested in your book about hadoop. And I cannot visit the link www.prohadoopbook.com -Original Message- From: jason hadoop [mailto:jason.had...@gmail.com] Sent: 2009年6月9日 20:47 To: core-user@hadoop.apache.org Subject: [ADV] Blatant marketing of the book Pro Hadoop. In honor of the 09 summit here is a 50% off coupon corrected code is LUCKYOU http://eBookshop.apress.com CODE LUCKYOU -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
hadoop interaction with AFS
Ran into an issue with running hadoop on a cluster that also has AFS installed. When a user ssh's in they get an 'extra' group id, I think it is called a 'pag', Problem is when you try to start hadoop from a shell that has a pag, one of the checks in newer versions stops because there is no group name to go with the gid, as an example from the 'groups' command: [bro...@nyx-login1 ~]$ groups cacstaff vasp stata charmm amber8 siesta eecs587f06 nolimit gti hyades adina chemkinpro coe molpro aces2 helios mcnp5 matlab id: cannot find name for group ID 1093742985 1093742985 If you ssh to the machine its self using ssh keys, the pag is not created, which our current work around, but is kinda kludgy, If you need more information let me know. Brock Palen bro...@mlds-networks.com www.mlds-networks.com MLDS Owner Senior Tech.
Re: hadoop interaction with AFS
Hey Brock, I've seen a similar problem at another site. They were able to solve this by upgrading their version of OpenAFS. Is that an option for you? Brian On Jun 17, 2009, at 8:35 AM, Brock Palen wrote: Ran into an issue with running hadoop on a cluster that also has AFS installed. When a user ssh's in they get an 'extra' group id, I think it is called a 'pag', Problem is when you try to start hadoop from a shell that has a pag, one of the checks in newer versions stops because there is no group name to go with the gid, as an example from the 'groups' command: [bro...@nyx-login1 ~]$ groups cacstaff vasp stata charmm amber8 siesta eecs587f06 nolimit gti hyades adina chemkinpro coe molpro aces2 helios mcnp5 matlab id: cannot find name for group ID 1093742985 1093742985 If you ssh to the machine its self using ssh keys, the pag is not created, which our current work around, but is kinda kludgy, If you need more information let me know. Brock Palen bro...@mlds-networks.com www.mlds-networks.com MLDS Owner Senior Tech.
Re: hadoop interaction with AFS
Hey Brock, I've seen a similar problem at another site. They were able to solve this by upgrading their version of OpenAFS. Is that an option for you? It might be, I see we are running 1.4.8 and they have 1.5 out, not sure what our cell is running and compatibility, good to know. Brian On Jun 17, 2009, at 8:35 AM, Brock Palen wrote: Ran into an issue with running hadoop on a cluster that also has AFS installed. When a user ssh's in they get an 'extra' group id, I think it is called a 'pag', Problem is when you try to start hadoop from a shell that has a pag, one of the checks in newer versions stops because there is no group name to go with the gid, as an example from the 'groups' command: [bro...@nyx-login1 ~]$ groups cacstaff vasp stata charmm amber8 siesta eecs587f06 nolimit gti hyades adina chemkinpro coe molpro aces2 helios mcnp5 matlab id: cannot find name for group ID 1093742985 1093742985 If you ssh to the machine its self using ssh keys, the pag is not created, which our current work around, but is kinda kludgy, If you need more information let me know. Brock Palen bro...@mlds-networks.com www.mlds-networks.com MLDS Owner Senior Tech.
Re: hadoop interaction with AFS
If re-building of Hadoop is an option then I guess you can replace 'groups' command in Shell.java (src/core/org/apache/hadoop/util/Shell.java) with 'id -rgn' to get the correct group name. -- Faisal On Wed, Jun 17, 2009 at 10:35 AM, Brock Palen bro...@mlds-networks.comwrote: Ran into an issue with running hadoop on a cluster that also has AFS installed. When a user ssh's in they get an 'extra' group id, I think it is called a 'pag', Problem is when you try to start hadoop from a shell that has a pag, one of the checks in newer versions stops because there is no group name to go with the gid, as an example from the 'groups' command: [bro...@nyx-login1 ~]$ groups cacstaff vasp stata charmm amber8 siesta eecs587f06 nolimit gti hyades adina chemkinpro coe molpro aces2 helios mcnp5 matlab id: cannot find name for group ID 1093742985 1093742985 If you ssh to the machine its self using ssh keys, the pag is not created, which our current work around, but is kinda kludgy, If you need more information let me know. Brock Palen bro...@mlds-networks.com www.mlds-networks.com MLDS Owner Senior Tech.
Trying to setup Cluster
Im trying to setup a cluster with 3 different machines running Fedora. I cant get them to log into the localhost without the password but thats the least of my worries at the moment. I am posting my config files and the master and slave files let me know if anyone can spot a problem with the configs... Hadoop-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.data.dir/name value$HADOOP_HOME/dfs-data/value finaltrue/final /property property namedfs.name.dir/name value$HADOOP_HOME/dfs-name/value finaltrue/final /property property namehadoop.tmp.dir/name value$HADOOP_HOME/hadoop-tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://gobi.something.something:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a FileSystem./description /property property namemapred.job.tracker/name valuekalahari.something.something:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namemapred.system.dir/name value$HADOOP_HOME/mapred-system/value finaltrue/final /property property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property property namemapred.local.dir/name value$HADOOP_HOME/mapred-local/value namedfs.replication/name value1/value /property /configuration Slave: kongur.something.something master: kalahari.something.something i execute the dfs-start.sh command from gobi.something.something. is there any other info that i should provide in order to help? Also Kongur is where im running the data node the master file on kongur should have localhost in it rite? thanks for the help Divij
Re: Anyway to sort keys before Reduce function in Hadoop ?
Thanks, Alex! It is really helpful, at least I know it is sorted in someway. Furthermore, could I control it as 'Ascend' or 'Descend' order ? Say if my keys are Integers, and I want them to be in Descend order, is it easy to do that ? Thanks again, -Kun --- On Mon, 6/15/09, Alex Loddengaard a...@cloudera.com wrote: From: Alex Loddengaard a...@cloudera.com Subject: Re: Anyway to sort keys before Reduce function in Hadoop ? To: core-user@hadoop.apache.org Date: Monday, June 15, 2009, 11:53 PM Hey Kun, Keys given to a given reducer instance are given in sorted order. Meaning, for a given reducer JVM instance, the reduce function will be called several times, once for each key. The order in which the keys are given to the reduce function are sorted. The sorting happens in the shuffle phase, which is basically partitioning and sorting. That said, if you have one reducer (which isn't possible in large jobs), keys will be given to you in sorted order. You may be interested in the combiner phase, which is essentially a mini reduce that happens before data is transferred between mapper and reducer: http://wiki.apache.org/hadoop/HadoopMapReduce (grep for combine) You may also find these videos useful: http://www.cloudera.com/hadoop-training-mapreduce-hdfs http://www.cloudera.com/hadoop-training-programming-with-hadoop Hope this helps. Let me know if I misunderstood your question. Alex On Mon, Jun 15, 2009 at 4:22 PM, Kunsheng Chen ke...@yahoo.com wrote: Hi everyone, Is there anyway to sort the keys before Reduce but after Map ? I also think of sorting keys myself in Reduce function, but it might take too many memory once the number of results getting large. I am thinking of using some numeric value as keys in Reduce (which was calculate by Map). If it is possible, I could output my results by some orders easily. Thanks in advance, -Kun
Re: Nor OOM Java Heap Space neither GC OverHead Limit Exeeceded
Hi Jason! Thanks for going with me to solve my problem. To restate things and make it more easier to understand: I am working in local mode in the directory which contains the job jar and also the Config and Data directories. I just removed the following three statements from my code: DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Config/), conf); DistributedCache.createSymlink(conf); The program executes till the same point as before now also and terminates. That means the above three statements are of no use while working in local mode. In local mode, the working directory for the mapreduce tasks becomes the current woking direcotry in which you started the hadoop command to execute the job. Since I have removed the DistributedCache.add. statements there should be no issue whether I am giving a file name or a directory name as argument to it. Now it seems to me that there is some problem in reading the binary file using binaryRead. Please let me know if I am going wrong anywhere. Thanks, Akhil jason hadoop wrote: I have only ever used the distributed cache to add files, including binary files such as shared libraries. It looks like you are adding a directory. The DistributedCache is not generally used for passing data, but for passing file names. The files must be stored in a shared file system (hdfs for simplicity) already. The distributed cache makes the names available to the tasks, and the the files are extracted from hdfs and stored in the task local work area on each task tracker node. It looks like you may be storing the contents of your files in the distributed cache. On Wed, Jun 17, 2009 at 6:56 AM, akhil1988 akhilan...@gmail.com wrote: Thanks Jason. I went inside the code of the statement and found out that it eventually makes some binaryRead function call to read a binary file and there it strucks. Do you know whether there is any problem in giving a binary file for addition to the distributed cache. In the statement DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); Data is a directory which contains some text as well as some binary files. In the statement Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); I can see(in the output messages) that it is able to read the text files but it gets struck at the binary files. So, I think here the problem is: it is not able to read the binary files which either have not been transferred to the cache or a binary file cannot be read. Do you know the solution to this? Thanks, Akhil jason hadoop wrote: Something is happening inside of your (Parameters. readConfigAndLoadExternalData(Config/allLayer1.config);) code, and the framework is killing the job for not heartbeating for 600 seconds On Tue, Jun 16, 2009 at 8:32 PM, akhil1988 akhilan...@gmail.com wrote: One more thing, finally it terminates there (after some time) by giving the final Exception: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217) at LbjTagger.NerTagger.main(NerTagger.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) akhil1988 wrote: Thank you Jason for your reply. My Map class is an inner class and it is a static class. Here is the structure of my code. public class NerTagger { public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, Text{ private Text word = new Text(); private static NETaggerLevel1 tagger1 = new NETaggerLevel1(); private static NETaggerLevel2 tagger2 = new NETaggerLevel2(); Map(){ System.out.println(HI2\n); Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); System.out.println(HI3\n); Parameters.forceNewSentenceOnLineBreaks=Boolean.parseBoolean(true); System.out.println(loading the tagger); tagger1=(NETaggerLevel1)Classifier.binaryRead(Parameters.pathToModelFile+.level1); System.out.println(HI5\n);
Re: MapContext.getInputSplit() returns nothing
Thanks, it looks like I can write a line reader in C++ that roughly does what the Java version does. This also means that I can deserialise my own custom formats as well. Thanks! Roshan On Tue, Jun 16, 2009 at 12:22 PM, Owen O'Malley omal...@apache.org wrote: Sorry, I forget how much isn't clear to people who are just starting. FileInputFormat creates FileSplits. The serialization is very stable and can't be changed without breaking things. The reason that pipes can't stringify it is that the string form of input splits are ambiguous (and since it is user code, we really can't make assumptions about it). The format of FileSplit is: 16 bit filename byte length filename in bytes 64 bit offset 64 bit length Technically the filename uses a funky utf-8 encoding, but in practice as long as the filename has ascii characters they are ascii. Look at org.apache.hadoop.io.UTF.writeString for the precise definition. -- Owen
NullPointerException running jps
Hi, I am getting a NullPointerException trying to run the jps command, which is kind of weird. Anyone has any idea on this? Thanks, -- Richa Khandelwal University of California, Santa Cruz CA
Re: Anyway to sort keys before Reduce function in Hadoop ?
an alternative is to create a new WritableComparator and then set it in the JobConf object with the method setOutputKeyComparatorClass(). You can use IntWritable.Comparator as a start. On Wed, Jun 17, 2009 at 9:37 AM, tim robertson timrobertson...@gmail.com wrote: I think you can do this by creating your own key type extending IntWritable and override the compareTo method to implement this. Cheers Tim On Wed, Jun 17, 2009 at 6:34 PM, Kunsheng Chen ke...@yahoo.com wrote: Thanks, Alex! It is really helpful, at least I know it is sorted in someway. Furthermore, could I control it as 'Ascend' or 'Descend' order ? Say if my keys are Integers, and I want them to be in Descend order, is it easy to do that ? Thanks again, -Kun --- On Mon, 6/15/09, Alex Loddengaard a...@cloudera.com wrote: From: Alex Loddengaard a...@cloudera.com Subject: Re: Anyway to sort keys before Reduce function in Hadoop ? To: core-user@hadoop.apache.org Date: Monday, June 15, 2009, 11:53 PM Hey Kun, Keys given to a given reducer instance are given in sorted order. Meaning, for a given reducer JVM instance, the reduce function will be called several times, once for each key. The order in which the keys are given to the reduce function are sorted. The sorting happens in the shuffle phase, which is basically partitioning and sorting. That said, if you have one reducer (which isn't possible in large jobs), keys will be given to you in sorted order. You may be interested in the combiner phase, which is essentially a mini reduce that happens before data is transferred between mapper and reducer: http://wiki.apache.org/hadoop/HadoopMapReduce (grep for combine) You may also find these videos useful: http://www.cloudera.com/hadoop-training-mapreduce-hdfs http://www.cloudera.com/hadoop-training-programming-with-hadoop Hope this helps. Let me know if I misunderstood your question. Alex On Mon, Jun 15, 2009 at 4:22 PM, Kunsheng Chen ke...@yahoo.com wrote: Hi everyone, Is there anyway to sort the keys before Reduce but after Map ? I also think of sorting keys myself in Reduce function, but it might take too many memory once the number of results getting large. I am thinking of using some numeric value as keys in Reduce (which was calculate by Map). If it is possible, I could output my results by some orders easily. Thanks in advance, -Kun
JobControl for Pipes?
Hello, Is there any way to express dependencies between map-reduce jobs (such as in org.apache.hadoop.mapred.jobcontrol) for pipes jobs? The provided header Pipes.hh does not seem to reflect any such capabilities. best, Roshan
Queries throwing errors.
select count(1) from test; Total MapReduce jobs = 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_200906171546_0001, Tracking URL = http://gobi.mssm.edu:50030/jobdetails.jsp?jobid=job_200906171546_0001 Kill Command = /home/divij/hive/build/hadoopcore/hadoop-0.19.0/bin/../bin/hadoop job -Dmapred.job.tracker=gobi.mssm.edu:54211 -kill job_200906171546_0001 2009-06-17 03:47:22,408 map = 0%, reduce =0% 2009-06-17 03:47:23,414 map = 0%, reduce =0% 2009-06-17 03:47:24,420 map = 0%, reduce =0% 2009-06-17 03:47:25,429 map = 0%, reduce =0% 2009-06-17 03:47:26,432 map = 0%, reduce =0% 2009-06-17 03:47:27,436 map = 0%, reduce =0% 2009-06-17 03:47:28,443 map = 0%, reduce =0% 2009-06-17 03:47:29,448 map = 0%, reduce =0% 2009-06-17 03:47:30,453 map = 0%, reduce =0% 2009-06-17 03:47:31,457 map = 0%, reduce =0% 2009-06-17 03:47:32,461 map = 0%, reduce =0% 2009-06-17 03:47:33,465 map = 0%, reduce =0% 2009-06-17 03:47:34,468 map = 0%, reduce =0% 2009-06-17 03:47:35,472 map = 0%, reduce =0% 2009-06-17 03:47:36,476 map = 0%, reduce =0% 2009-06-17 03:47:37,480 map = 0%, reduce =0% 2009-06-17 03:47:38,484 map = 0%, reduce =0% 2009-06-17 03:47:39,488 map = 0%, reduce =0% 2009-06-17 03:47:40,491 map = 0%, reduce =0% 2009-06-17 03:47:41,495 map = 0%, reduce =0% 2009-06-17 03:47:42,498 map = 0%, reduce =0% 2009-06-17 03:47:43,502 map = 0%, reduce =0% 2009-06-17 03:47:44,509 map = 100%, reduce =100% Ended Job = job_200906171546_0001 with errors FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver I have no idea whats wrong. I thought this b
Re: Anyway to sort keys before Reduce function in Hadoop ?
On Wed, Jun 17, 2009 at 12:26 PM, Chuck Lam chuck@gmail.com wrote: an alternative is to create a new WritableComparator and then set it in the JobConf object with the method setOutputKeyComparatorClass(). You can use IntWritable.Comparator as a start. The important part of that is to define a RawComparator for your key class and call JobConf.setOutputKeyComparatorClass with it.So if you wanted to invert the default sort order of IntWritable keys, you could: public class InvertedIntWritableComparator extends IntWritable.Comparator { public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { return -1 * super.compare(b1,s1,l1,b2,s2,l2); } } then job.setOutputKeyComparatorClass(InvertedIntWritableComparator.class); -- Owen
Restrict output of mappers to reducers running on same node?
Hi, Can I restrict the output of mappers running on a node to go to reducer(s) running on the same node? Let me explain why I want to do this- I am converting huge number of XML files into SequenceFiles. So theoretically I don't even need reducers, mappers would read xml files and output Sequencefiles. But the problem with this approach is I will end up getting huge number of small output files. To avoid generating large number of smaller files, I can Identity reducers. But by running reducers, I am unnecessarily transfering data over network. I ran some test case using a small subset of my data (~90GB). With map only jobs, my cluster finished conversion in only 6 minutes. But with map and Identity reducers job, it takes around 38 minutes. I have to process close to a terabyte of data. So I was thinking of a faster alternatives- * Writing a custom OutputFormat * Somehow restrict output of mappers running on a node to go to reducers running on the same node. May be I can write my own partitioner (simple) but not sure how Hadoop's framework assigns partitions to reduce tasks. Any pointers ? Or this is not possible at all ? Thanks, Tarandeep
Re: Pipes example wordcount-nopipe.cc failed when reading from input splits
Does anybody have any updates on this? How can we have our own RecordReader in Hadoop pipes? When I try to print the context.getInputSplit, I get the filenames along with some junk characters. As a result the file open fails. Anybody got it working? Viral. 11 Nov. wrote: I traced into the c++ recordreader code: WordCountReader(HadoopPipes::MapContext context) { std::string filename; HadoopUtils::StringInStream stream(context.getInputSplit()); HadoopUtils::deserializeString(filename, stream); struct stat statResult; stat(filename.c_str(), statResult); bytesTotal = statResult.st_size; bytesRead = 0; cout filenameendl; file = fopen(filename.c_str(), rt); HADOOP_ASSERT(file != NULL, failed to open + filename); } I got nothing for the filename virable, which showed the InputSplit is empty. 2008/3/4, 11 Nov. nov.eleve...@gmail.com: hi colleagues, I have set up the single node cluster to test pipes examples. wordcount-simple and wordcount-part work just fine. but wordcount-nopipe can't run. Here is my commnad line: bin/hadoop pipes -conf src/examples/pipes/conf/word-nopipe.xml -input input/ -output out-dir-nopipe1 and here is the error message printed on my console: 08/03/03 23:23:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 08/03/03 23:23:06 INFO mapred.FileInputFormat: Total input paths to process : 1 08/03/03 23:23:07 INFO mapred.JobClient: Running job: job_200803032218_0004 08/03/03 23:23:08 INFO mapred.JobClient: map 0% reduce 0% 08/03/03 23:23:11 INFO mapred.JobClient: Task Id : task_200803032218_0004_m_00_0, Status : FAILED java.io.IOException: pipe child exception at org.apache.hadoop.mapred.pipes.Application.abort( Application.java:138) at org.apache.hadoop.mapred.pipes.PipesMapRunner.run( PipesMapRunner.java:83) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main( TaskTracker.java:1787) Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java :313) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java :335) at org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run( BinaryProtocol.java:112) task_200803032218_0004_m_00_0: task_200803032218_0004_m_00_0: task_200803032218_0004_m_00_0: task_200803032218_0004_m_00_0: Hadoop Pipes Exception: failed to open at /home/hadoop/hadoop-0.15.2-single-cluster /src/examples/pipes/impl/wordcount-nopipe.cc:67 in WordCountReader::WordCountReader(HadoopPipes::MapContext) Could anybody tell me how to fix this? That will be appreciated. Thanks a lot! -- View this message in context: http://www.nabble.com/Pipes-example-wordcount-nopipe.cc-failed-when-reading-from-input-splits-tp15807856p24084734.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Announcing CloudBase-1.3.1 release
How about the index of CloudBase? On Wed, Jun 17, 2009 at 4:16 AM, Ru, Yanbo y...@business.com wrote: Hi, We have released 1.3.1 version of CloudBase on sourceforge- https://sourceforge.net/projects/cloudbase CloudBase is a data warehouse system for Terabyte Petabyte scale analytics. It is built on top of Map-Reduce architecture. It allows you to query flat log files using ANSI SQL. Please give it a try and send us your feedback. Thanks, Yanbo Release notes - New Features: * CREATE CSV tables - One can create tables on top of data in CSV (Comma Separated Values) format and query them using SQL. Current implementation doesn't accept CSV records which span multiple lines. Data may not be processed correctly if a field contains embedded line-breaks. Please visit http://cloudbase.sourceforge.net/index.html#userDoc for detailed specification of the CSV format. Bug fixes: * Aggregate function 'AVG' returns the same value as 'SUM' function * If a query has multiple aliases, only the last alias works
Re: NullPointerException running jps
Can you post the stack trace? On Wed, Jun 17, 2009 at 12:21 PM, Richa Khandelwal richa@gmail.comwrote: Hi, I am getting a NullPointerException trying to run the jps command, which is kind of weird. Anyone has any idea on this? Thanks, -- Richa Khandelwal University of California, Santa Cruz CA -- Regards, Praveen
Re: Restrict output of mappers to reducers running on same node?
You could look at CombineFileInputFormat to generate a single split out of several files. Your partitioner would be able to assign keys to specific reducers, but you would not have control on which node a given reduce task will run. Jothi On 6/18/09 5:10 AM, Tarandeep Singh tarand...@gmail.com wrote: Hi, Can I restrict the output of mappers running on a node to go to reducer(s) running on the same node? Let me explain why I want to do this- I am converting huge number of XML files into SequenceFiles. So theoretically I don't even need reducers, mappers would read xml files and output Sequencefiles. But the problem with this approach is I will end up getting huge number of small output files. To avoid generating large number of smaller files, I can Identity reducers. But by running reducers, I am unnecessarily transfering data over network. I ran some test case using a small subset of my data (~90GB). With map only jobs, my cluster finished conversion in only 6 minutes. But with map and Identity reducers job, it takes around 38 minutes. I have to process close to a terabyte of data. So I was thinking of a faster alternatives- * Writing a custom OutputFormat * Somehow restrict output of mappers running on a node to go to reducers running on the same node. May be I can write my own partitioner (simple) but not sure how Hadoop's framework assigns partitions to reduce tasks. Any pointers ? Or this is not possible at all ? Thanks, Tarandeep
Re: Restrict output of mappers to reducers running on same node?
You can open your sequence file in the mapper configure method, write to it in your map, and close it in the mapper close method. Then you end up with 1 sequence file per map. I am making an assumption that each key,value to your map some how represents a single xml file/item. On Wed, Jun 17, 2009 at 7:29 PM, Jothi Padmanabhan joth...@yahoo-inc.comwrote: You could look at CombineFileInputFormat to generate a single split out of several files. Your partitioner would be able to assign keys to specific reducers, but you would not have control on which node a given reduce task will run. Jothi On 6/18/09 5:10 AM, Tarandeep Singh tarand...@gmail.com wrote: Hi, Can I restrict the output of mappers running on a node to go to reducer(s) running on the same node? Let me explain why I want to do this- I am converting huge number of XML files into SequenceFiles. So theoretically I don't even need reducers, mappers would read xml files and output Sequencefiles. But the problem with this approach is I will end up getting huge number of small output files. To avoid generating large number of smaller files, I can Identity reducers. But by running reducers, I am unnecessarily transfering data over network. I ran some test case using a small subset of my data (~90GB). With map only jobs, my cluster finished conversion in only 6 minutes. But with map and Identity reducers job, it takes around 38 minutes. I have to process close to a terabyte of data. So I was thinking of a faster alternatives- * Writing a custom OutputFormat * Somehow restrict output of mappers running on a node to go to reducers running on the same node. May be I can write my own partitioner (simple) but not sure how Hadoop's framework assigns partitions to reduce tasks. Any pointers ? Or this is not possible at all ? Thanks, Tarandeep -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: JobControl for Pipes?
Job control is coming with the Hadoop WorkFlow manager, in the mean time there is cascade by chris wensel. I do not have any personal experience with either. I do not know how pipes interacts with either. On Wed, Jun 17, 2009 at 12:43 PM, Roshan James roshan.james.subscript...@gmail.com wrote: Hello, Is there any way to express dependencies between map-reduce jobs (such as in org.apache.hadoop.mapred.jobcontrol) for pipes jobs? The provided header Pipes.hh does not seem to reflect any such capabilities. best, Roshan -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Hadoop Eclipse Plugin
Hi, I have a problem configuring Hadoop Map/Reduce plugin with Eclipse. Setup Details: I have a namenode, a jobtracker and two data nodes, all running on ubuntu. My set up works fine with example programs. I want to connect to this setup from eclipse. namenode - 10.20.104.62 - 54310(port) jobtracker - 10.20.104.53 - 54311(port) I run eclipse on a different windows m/c. I want to configure map/reduce plugin with eclipse, so that I can access HDFS from windows. Map/Reduce master Host - With jobtracker IP, it did not work Port - With jobtracker port, it did not work DFS master Host - With namenode IP, It did not work Port - With namenode port, it did not work I tried other combination too by giving namenode details for Map/Reduce master and jobtracker details for DFS master. It did not work either. If anyone has configured plugin with eclipse, please let me know. Even the pointers to how to configure it will be highly appreciated. Thanks, Praveen
Re: Pipes example wordcount-nopipe.cc failed when reading from input splits
I tried this example and it seems that the input/output should only be in file:///... format to get correct results. - Jianmin From: Viral K khaju...@yahoo-inc.com To: core-user@hadoop.apache.org Sent: Thursday, June 18, 2009 8:57:47 AM Subject: Re: Pipes example wordcount-nopipe.cc failed when reading from input splits Does anybody have any updates on this? How can we have our own RecordReader in Hadoop pipes? When I try to print the context.getInputSplit, I get the filenames along with some junk characters. As a result the file open fails. Anybody got it working? Viral. 11 Nov. wrote: I traced into the c++ recordreader code: WordCountReader(HadoopPipes::MapContext context) { std::string filename; HadoopUtils::StringInStream stream(context.getInputSplit()); HadoopUtils::deserializeString(filename, stream); struct stat statResult; stat(filename.c_str(), statResult); bytesTotal = statResult.st_size; bytesRead = 0; cout filenameendl; file = fopen(filename.c_str(), rt); HADOOP_ASSERT(file != NULL, failed to open + filename); } I got nothing for the filename virable, which showed the InputSplit is empty. 2008/3/4, 11 Nov. nov.eleve...@gmail.com: hi colleagues, I have set up the single node cluster to test pipes examples. wordcount-simple and wordcount-part work just fine. but wordcount-nopipe can't run. Here is my commnad line: bin/hadoop pipes -conf src/examples/pipes/conf/word-nopipe.xml -input input/ -output out-dir-nopipe1 and here is the error message printed on my console: 08/03/03 23:23:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 08/03/03 23:23:06 INFO mapred.FileInputFormat: Total input paths to process : 1 08/03/03 23:23:07 INFO mapred.JobClient: Running job: job_200803032218_0004 08/03/03 23:23:08 INFO mapred.JobClient: map 0% reduce 0% 08/03/03 23:23:11 INFO mapred.JobClient: Task Id : task_200803032218_0004_m_00_0, Status : FAILED java.io.IOException: pipe child exception at org.apache.hadoop.mapred.pipes.Application.abort( Application.java:138) at org.apache.hadoop.mapred.pipes.PipesMapRunner.run( PipesMapRunner.java:83) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main( TaskTracker.java:1787) Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java :313) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java :335) at org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run( BinaryProtocol.java:112) task_200803032218_0004_m_00_0: task_200803032218_0004_m_00_0: task_200803032218_0004_m_00_0: task_200803032218_0004_m_00_0: Hadoop Pipes Exception: failed to open at /home/hadoop/hadoop-0.15.2-single-cluster /src/examples/pipes/impl/wordcount-nopipe.cc:67 in WordCountReader::WordCountReader(HadoopPipes::MapContext) Could anybody tell me how to fix this? That will be appreciated. Thanks a lot! -- View this message in context: http://www.nabble.com/Pipes-example-wordcount-nopipe.cc-failed-when-reading-from-input-splits-tp15807856p24084734.html Sent from the Hadoop core-user mailing list archive at Nabble.com.