Re: Custom FileOutputFormat / RecordWriter

2011-07-26 Thread Harsh J
Tom, What I meant to say was that doing this is well supported with existing API/libraries itself: - The class MultipleOutputs supports providing a filename for an output. See MultipleOutputs.addNamedOutput usage [1]. - The type 'NullWritable' is a special writable that doesn't do anything. So

Submitting and running hadoop jobs Programmatically

2011-07-26 Thread madhu phatak
Hi, I am working on a open source project Nectarhttps://github.com/zinnia-phatak-dev/Nectar where i am trying to create the hadoop jobs depending upon the user input. I was using Java Process API to run the bin/hadoop shell script to submit the jobs. But it seems not good way because the process

Re: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread Harsh J
A simple job.submit(…) OR JobClient.runJob(jobConf), submits your job right from the Java API. Does this not work for you? If not, what error do you face? Forking out and launching from a system process is a bad idea unless there's absolutely no way. On Tue, Jul 26, 2011 at 3:28 PM, madhu phatak

RE: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread Devaraj K
Hi Madhu, You can submit the jobs using the Job API's programmatically from any system. The job submission code can be written this way. // Create a new Job Job job = new Job(new Configuration()); job.setJarByClass(MyJob.class); // Specify various job-specific

Re: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread madhu phatak
Hi I am using the same APIs but i am not able to run the jobs by just adding the configuration files and jars . It never create a job in Hadoop , it just shows cleaning up staging area and fails. On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K devara...@huawei.com wrote: Hi Madhu, You can

Re: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread Harsh J
Madhu, Do you get a specific error message / stack trace? Could you also paste your JT logs? On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak phatak@gmail.com wrote: Hi  I am using the same APIs but i am not able to run the jobs by just adding the configuration files and jars . It never

Re: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread madhu phatak
I am using JobControl.add() to add a job and running job control in a separate thread and using JobControl.allFinished() to see all jobs completed or not . Is this work same as Job.submit()?? On Tue, Jul 26, 2011 at 4:08 PM, Harsh J ha...@cloudera.com wrote: Madhu, Do you get a specific error

Re: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread Harsh J
Yes. Internally, it calls regular submit APIs. On Tue, Jul 26, 2011 at 4:32 PM, madhu phatak phatak@gmail.com wrote: I am using JobControl.add() to add a job and running job control in a separate thread and using JobControl.allFinished() to see all jobs completed or not . Is this work same

RE: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread Devaraj K
Madhu, Can you check the client logs, whether any error/exception is coming while submitting the job? Devaraj K -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, July 26, 2011 5:01 PM To: common-user@hadoop.apache.org Subject: Re: Submitting and running

Re: Custom FileOutputFormat / RecordWriter

2011-07-26 Thread Tom Melendez
Hi Harsh, Cool, thanks for the details. For anyone interested, with your tip and description I was able to find an example inside the Hadoop in Action (Chapter 7, p168) book. Another question, though, it doesn't look like MultipleOutputs will let me control the filename in a per-key (per map)

Multiple Output Formats

2011-07-26 Thread Roger Chen
Hi all, I am attempting to implement MultipleOutputFormat to write data to multiple files dependent on the output keys and values. Can somebody provide a working example with how to implement this in Hadoop 0.20.2? Thanks! -- Roger Chen UC Davis Genome Center

RE: Hadoop-streaming using binary executable c program

2011-07-26 Thread Daniel Yehdego
Good afternoon Bobby, Thanks so much, now its working excellent. And the speed is also reasonable. Once again thanks u. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu From: ev...@yahoo-inc.com To:

Re: Multiple Output Formats

2011-07-26 Thread Ayon Sinha
package com.shopkick.util; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat; public class MultiFileOutput extends MultipleTextOutputFormatText, Text { @Override protected String generateFileNameForKeyValue(Text key, Text value,

Cygwin not working with Hadoop and Eclipse Plugin

2011-07-26 Thread A Df
Dear All: I am trying to run Hadoop on Windows 7 so as to test programs before moving to Unix/Linux. I have downloaded the Hadoop 0.20.2 and Eclipse 3.6 because I want to use the plugin. I am also using cygwin. However, I set the environment variable for JAVA_HOME and added the

Re: Multiple Output Formats

2011-07-26 Thread Harsh J
Roger, Beyond Ayon's example answer, I'd like you to note that the newer API will *not* carry a supported MultipleOutputFormat as it has been obsoleted away in favor of MultipleOutputs, whose use is much easier, is threadsafe, and also carries an example to look at, at [1]. [1] -

Re: Custom FileOutputFormat / RecordWriter

2011-07-26 Thread Harsh J
Tom, You can theoretically add N amounts of named outputs from a single task itself, even from within the map() calls (addNamedOutputs or addMultiNamedOutputs checks within itself for dupes, so you don't have to). So yes, you can keep adding outputs and using them per-key, and given your earlier

Re: Cygwin not working with Hadoop and Eclipse Plugin

2011-07-26 Thread James Seigel
Try using virtual box/vmware and downloading either an image that has hadoop on it or a linux image and installing it there. Good luck James. On 2011-07-26, at 12:33 PM, A Df wrote: Dear All: I am trying to run Hadoop on Windows 7 so as to test programs before moving to Unix/Linux. I

Re: Cygwin not working with Hadoop and Eclipse Plugin

2011-07-26 Thread Harsh J
A Df, (Inlines) On Wed, Jul 27, 2011 at 12:03 AM, A Df abbey_dragonfor...@yahoo.com wrote: Dear All: I am trying to run Hadoop on Windows 7 so as to test programs before moving to Unix/Linux. I have downloaded the Hadoop 0.20.2 and Eclipse 3.6 because I want to use the plugin. I am also

RE: Cygwin not working with Hadoop and Eclipse Plugin

2011-07-26 Thread Eric Payne
Hi A Df, I haven't set up Hadoop under cygwin, but I use cygwin a lot. One thing I would suggest is to use the bash shell in cygwin and use the following format for the $PATH additions: PATH=$PATH:/cygdrive/c/cygwin/bin:/cygdrive/c/cygwin/usr/bin My understanding is that the stable version of

Re: Cygwin not working with Hadoop and Eclipse Plugin

2011-07-26 Thread Eric Fiala
A Df, Try reinstalling java to a friendlier location (without spaces) - c:\java rather than c:\Program Files - it's parsing on the space is what it appears from the error message ~ I've encountered this very same problem. JAVA_HOME to be the root of your Java installation which I changed to

Re: Cygwin not working with Hadoop and Eclipse Plugin

2011-07-26 Thread A Df
Harsh: See (inline at the **) I hope its easy to follow and for the other responses, I was not sure how to respond to get everything into one. Sorry for top posting! Eric where would I put the line below and explain in newbie terms, thanks:

Re: Cygwin not working with Hadoop and Eclipse Plugin

2011-07-26 Thread Harsh J
A Df, On Wed, Jul 27, 2011 at 1:42 AM, A Df abbey_dragonfor...@yahoo.com wrote: Harsh: See (inline at the **) I hope its easy to follow and for the other responses, I was not sure how to respond to get everything into one. Sorry for top posting! Np! I don't strongly enforce a style of

Re: Multiple Output Formats

2011-07-26 Thread Roger Chen
The problem I'm facing right now is with the configuration needed for MultipleOutputs, because JobConf is deprecated now and I am unable to do its equivalent with Configuration. I set the configuration of the job by: Job job = new Job(getConf()); but when I'm trying to use this line in my

Re: Multiple Output Formats

2011-07-26 Thread Harsh J
Gotcha, my bad then. The hadoop distribution I use provides a backported MO, so I overlooked this particular issue while replying. Still, the warning holds as the versions would roll ahead. But I believe the refactor would not be that much of a pain, so perhaps its a no-worry. On Wed, Jul 27,

Re: Submitting and running hadoop jobs Programmatically

2011-07-26 Thread madhu phatak
Hi I am submitting the job as follows java -cp Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/* com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob input/book.csv kkk11fffrrw 1 I get the log in CLI

Build Hadoop 0.20.2 from source

2011-07-26 Thread Vighnesh Avadhani
Hi, I want to build Hadoop 0.20.2 from source using the Eclipse IDE. Can anyone help me with this? Regards, Vighnesh

Re: Build Hadoop 0.20.2 from source

2011-07-26 Thread Uma Maheswara Rao G 72686
Hi Vighnesh, Step 1) Download the code base from apache svn repository. Step 2) In root folder you can find build.xml file. In that folder just execute a)ant and b)ant eclipse this will generate the eclipse project setings files. After this directly you can import this project in you