Re: libhdfs install dep
I'd recommend using the packages for Apache Hadoop from Apache Bigtop (https://cwiki.apache.org/confluence/display/BIGTOP). The ones upstream (here) aren't maintained as much these days. On Tue, Sep 25, 2012 at 6:27 PM, Pastrana, Rodrigo (RIS-BCT) wrote: > Leo, yes I'm working with hadoop-1.0.1-1.amd64.rpm from Apache's download > site. > The rpm installs libhdfs in /usr/lib64 so I'm not sure why I would need the > hadoop-<*>libhdfs* rpm. > > Any idea why the installed /usr/lib64/libhdfs.so is not detected by the > package managers? > > Thanks, Rodrigo. > > -Original Message- > From: Leo Leung [mailto:lle...@ddn.com] > Sent: Tuesday, September 25, 2012 2:11 AM > To: common-user@hadoop.apache.org > Subject: RE: libhdfs install dep > > Rodrigo, > Assuming you are asking for hadoop 1.x > > You are missing the hadoop-<*>libhdfs* rpm. > Build it or get it from the vendor you got your hadoop from. > > > > -Original Message- > From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com] > Sent: Monday, September 24, 2012 8:20 PM > To: 'core-u...@hadoop.apache.org' > Subject: libhdfs install dep > > Anybody know why libhdfs.so is not found by package managers on CentOS 64 and > OpenSuse64? > > I hava an rpm which declares Hadoop as a dependacy, but the package managers > (KPackageKit, zypper, etc) report libhdfs.so as a missing dependency > eventhough Hadoop has been installed via rpm package, and libhdfs.so is > installed as well. > > Thanks, Rodrigo. > > > - > The information contained in this e-mail message is intended only > for the personal and confidential use of the recipient(s) named > above. This message may be an attorney-client communication and/or > work product and as such is privileged and confidential. If the > reader of this message is not the intended recipient or an agent > responsible for delivering it to the intended recipient, you are > hereby notified that you have received this document in error and > that any review, dissemination, distribution, or copying of this > message is strictly prohibited. If you have received this > communication in error, please notify us immediately by e-mail, and > delete the original message. -- Harsh J
Re: Python + hdfs written thrift sequence files: lots of moving parts!
Thanks harsh: In any case, I'm really curious about how it is that sequence file headers are formatted, as the documentation in the SequenceFile javadocs seems to be very generic. To make my questions more concrete: 1) I notice that the FileSplit class has a getStart() function. It is documented as returning the place to start "processing". Does that imply that a FileSplit does, or does not include a header? http://hadoop.apache.org/docs/r0.20.2/api/org/apache/hadoop/mapreduce/lib/input/FileSplit.html#getStart%28%29 2) Also, Its not clear to me that how compression and serialization are related. These are two inticrately coupled aspects of HDFS file writing, and im not sure what the idiom for coordinating the compression of records to the deserialization is.
Re: Python + hdfs written thrift sequence files: lots of moving parts!
Hi Jay, This may be off-topic to you, but I feel its related: Use Avro DataFiles. There's Python support already available, as well as several other languages. On Tue, Sep 25, 2012 at 10:57 PM, Jay Vyas wrote: > Hi guys! > > Im trying to read some hadoop outputted thrift files in plain old java > (without using SequenceFile.Reader). The reason for this is that I > > (1) want to understand the sequence file format better and > (2) would like to be able to port this code to a language which doesnt have > robust hadoop sequence file i/o / thrift support (python). My code looks > like this: > > So, before reading forward, if anyone has : > > 1) Some general hints on how to create a Sequence file with thrift encoded > key values in python would be very useful. > 2) Some tips on the generic approach for reading a sequencefile (the > comments seem to be a bit underspecified in the SequenceFile header) > > I'd appreciate it! > > Now, here is my adventure into thrift/hdfs sequence file i/o : > > I've written a simple stub which , I think, should be the start of a > sequence file reader (just tries to skip the header and get straight to the > data). > > But it doesnt handle compression. > > http://pastebin.com/vyfgjML9 > > So, this code ^^ appears to fail with cryptic errors : "don't know what > type: 15". > > This error comes from a case statement, which attempts to determine what > type of thrift record is being read in: > "fail 127 don't know what type: 15" > > private byte getTType(byte type) throws TProtocolException { > switch ((byte)(type & 0x0f)) { > case TType.STOP: > return TType.STOP; > case Types.BOOLEAN_FALSE: > case Types.BOOLEAN_TRUE: > return TType.BOOL; > > case Types.STRUCT: > return TType.STRUCT; > default: > throw new TProtocolException("don't know what type: " + > (byte)(type & 0x0f)); > } > > Upon further investigation, I have found that, in fact, the Configuration > object is (of course) heavily utilized by the SequenceFile reader, in > particular, to > determine the Codec. That corroborates my hypothesis that the data needs > to be decompressed or decoded before it can be deserialized by thrift, I > believe. > > So... I guess what Im assuming is missing here, is that I don't know how to > manually reproduce the Codec/GZip, etc.. logic inside of > SequenceFile.Reader in plain old java (i.e without cheating and using the > SequenceFile.Reader class that is configured in our mapreduce soruce > code). > > With my end goal being to read the file in python, I think it would be nice > to be able to read the sequencefile in java, and use this as a template > (since I know that my thrift objects and serialization are working > correctly in my current java source codebase, when read in from > SequenceFile.Reader api). > > Any suggestions on how I can distill the logic of the SequenceFile.Reader > class into a simplified version which is specific to my data, so that I can > start porting into a python script which is capable of scanning a few real > sequencefiles off of HDFS would be much appreciated !!! > > In general... what are the core steps for doing i/o with sequence files > that are compressed and or serialized in different formats? Do we > decompress first , and then deserialize? Or do them both at the same time > ? Thanks! > > PS I've added an issue to github here > https://github.com/matteobertozzi/Hadoop/issues/5, for a python > SequenceFile reader. If I get some helpful hints on this thread maybe I > can directly implement an example on matteobertozzi's python hadoop trunk. > > -- > Jay Vyas > MMSB/UCHC -- Harsh J
RE: libhdfs install dep
I see, The apache RPM distro does include the libhdfs* binaries. (And it is self-contained) For the Apache Distro (once installed) Yum,zipper,KPack* only knows about hadoop-1.0.1-1.amd* as the provider for the libhdfs (check with rpm -qlp hadoop.rpm -- this will give you all of the files it has/installs or yum deplist ) So the package that you are trying to install, which is failing the dependency, has a requirement for something name "hadoop" (per your e-mail) as a package that contains the libhdfs. (This is not what hadoop-1.0.1-1 claims) To solve this: The brute's way Force the install of the rpm without dependencies. rpm -i --nodeps .rpm Another way. Match up all your RPM package dependencies from the vendor(s) 1) Talk to the .rpm provider, to see if they have the proper packaging for the apache RPM distro. 2) Use or find a hadoop distro from [a] vendor that has all the proper "package" naming resolution. Good luck -Original Message- From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com] Sent: Tuesday, September 25, 2012 5:58 AM To: common-user@hadoop.apache.org Subject: RE: libhdfs install dep Leo, yes I'm working with hadoop-1.0.1-1.amd64.rpm from Apache's download site. The rpm installs libhdfs in /usr/lib64 so I'm not sure why I would need the hadoop-<*>libhdfs* rpm. Any idea why the installed /usr/lib64/libhdfs.so is not detected by the package managers? Thanks, Rodrigo. -Original Message- From: Leo Leung [mailto:lle...@ddn.com] Sent: Tuesday, September 25, 2012 2:11 AM To: common-user@hadoop.apache.org Subject: RE: libhdfs install dep Rodrigo, Assuming you are asking for hadoop 1.x You are missing the hadoop-<*>libhdfs* rpm. Build it or get it from the vendor you got your hadoop from. -Original Message- From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com] Sent: Monday, September 24, 2012 8:20 PM To: 'core-u...@hadoop.apache.org' Subject: libhdfs install dep Anybody know why libhdfs.so is not found by package managers on CentOS 64 and OpenSuse64? I hava an rpm which declares Hadoop as a dependacy, but the package managers (KPackageKit, zypper, etc) report libhdfs.so as a missing dependency eventhough Hadoop has been installed via rpm package, and libhdfs.so is installed as well. Thanks, Rodrigo. - The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. This message may be an attorney-client communication and/or work product and as such is privileged and confidential. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message.
Python + hdfs written thrift sequence files: lots of moving parts!
Hi guys! Im trying to read some hadoop outputted thrift files in plain old java (without using SequenceFile.Reader). The reason for this is that I (1) want to understand the sequence file format better and (2) would like to be able to port this code to a language which doesnt have robust hadoop sequence file i/o / thrift support (python). My code looks like this: So, before reading forward, if anyone has : 1) Some general hints on how to create a Sequence file with thrift encoded key values in python would be very useful. 2) Some tips on the generic approach for reading a sequencefile (the comments seem to be a bit underspecified in the SequenceFile header) I'd appreciate it! Now, here is my adventure into thrift/hdfs sequence file i/o : I've written a simple stub which , I think, should be the start of a sequence file reader (just tries to skip the header and get straight to the data). But it doesnt handle compression. http://pastebin.com/vyfgjML9 So, this code ^^ appears to fail with cryptic errors : "don't know what type: 15". This error comes from a case statement, which attempts to determine what type of thrift record is being read in: "fail 127 don't know what type: 15" private byte getTType(byte type) throws TProtocolException { switch ((byte)(type & 0x0f)) { case TType.STOP: return TType.STOP; case Types.BOOLEAN_FALSE: case Types.BOOLEAN_TRUE: return TType.BOOL; case Types.STRUCT: return TType.STRUCT; default: throw new TProtocolException("don't know what type: " + (byte)(type & 0x0f)); } Upon further investigation, I have found that, in fact, the Configuration object is (of course) heavily utilized by the SequenceFile reader, in particular, to determine the Codec. That corroborates my hypothesis that the data needs to be decompressed or decoded before it can be deserialized by thrift, I believe. So... I guess what Im assuming is missing here, is that I don't know how to manually reproduce the Codec/GZip, etc.. logic inside of SequenceFile.Reader in plain old java (i.e without cheating and using the SequenceFile.Reader class that is configured in our mapreduce soruce code). With my end goal being to read the file in python, I think it would be nice to be able to read the sequencefile in java, and use this as a template (since I know that my thrift objects and serialization are working correctly in my current java source codebase, when read in from SequenceFile.Reader api). Any suggestions on how I can distill the logic of the SequenceFile.Reader class into a simplified version which is specific to my data, so that I can start porting into a python script which is capable of scanning a few real sequencefiles off of HDFS would be much appreciated !!! In general... what are the core steps for doing i/o with sequence files that are compressed and or serialized in different formats? Do we decompress first , and then deserialize? Or do them both at the same time ? Thanks! PS I've added an issue to github here https://github.com/matteobertozzi/Hadoop/issues/5, for a python SequenceFile reader. If I get some helpful hints on this thread maybe I can directly implement an example on matteobertozzi's python hadoop trunk. -- Jay Vyas MMSB/UCHC
Re: libhdfs install dep
Hi Rodrigo, The hadoop RPMs are a bit deficient compared to those you would find from your Linux distribution. For example, look at the Apache RPM you used: [bbockelm@rcf-bockelman ~]$ rpm -qp http://mirrors.sonic.net/apache/hadoop/common/hadoop-1.0.3/hadoop-1.0.3-1.x86_64.rpm --provides hadoop hadoop = 1.0.3-1 Normally, you would expect to see something like this (using the CDH4 distribution as an example) as it contains a shared library: [bbockelm@brian-test ~]$ rpm -q --provides hadoop-libhdfs libhdfs.so.0()(64bit) hadoop-libhdfs = 2.0.0+88-1.cdh4.0.0.p0.30.osg.el5 libhdfs.so.0 hadoop-libhdfs = 2.0.0+88-1.cdh4.0.0.p0.30.osg.el5 Because the Apache RPM does not list itself as providing libhdfs.so.0()(64bit), it breaks your automatic RPM dependency detection. [Aside: I know from experience that building a high-quality (as in, follows the Fedora Packaging Guidelines) RPM for Java software is incredibly hard as the packaging approaches between the Linux distributions and Java community are incredibly divergent. Not to say that the Java approach is inherently wrong, it's just different, and does not translate naturally to RPM. Accordingly, to take Hadoop and make a rule-abiding RPM in Fedora would be hundreds of hours of work. It's one of those things that appear to be much easier than it is to accomplish.] The Hadoop community is very friendly, and I'm sure they would accept any patches to fix this oversight in future releases. Brian On Sep 25, 2012, at 7:57 AM, "Pastrana, Rodrigo (RIS-BCT)" wrote: > Leo, yes I'm working with hadoop-1.0.1-1.amd64.rpm from Apache's download > site. > The rpm installs libhdfs in /usr/lib64 so I'm not sure why I would need the > hadoop-<*>libhdfs* rpm. > > Any idea why the installed /usr/lib64/libhdfs.so is not detected by the > package managers? > > Thanks, Rodrigo. > > -Original Message- > From: Leo Leung [mailto:lle...@ddn.com] > Sent: Tuesday, September 25, 2012 2:11 AM > To: common-user@hadoop.apache.org > Subject: RE: libhdfs install dep > > Rodrigo, > Assuming you are asking for hadoop 1.x > > You are missing the hadoop-<*>libhdfs* rpm. > Build it or get it from the vendor you got your hadoop from. > > > > -Original Message- > From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com] > Sent: Monday, September 24, 2012 8:20 PM > To: 'core-u...@hadoop.apache.org' > Subject: libhdfs install dep > > Anybody know why libhdfs.so is not found by package managers on CentOS 64 and > OpenSuse64? > > I hava an rpm which declares Hadoop as a dependacy, but the package managers > (KPackageKit, zypper, etc) report libhdfs.so as a missing dependency > eventhough Hadoop has been installed via rpm package, and libhdfs.so is > installed as well. > > Thanks, Rodrigo. > > > - > The information contained in this e-mail message is intended only > for the personal and confidential use of the recipient(s) named > above. This message may be an attorney-client communication and/or > work product and as such is privileged and confidential. If the > reader of this message is not the intended recipient or an agent > responsible for delivering it to the intended recipient, you are > hereby notified that you have received this document in error and > that any review, dissemination, distribution, or copying of this > message is strictly prohibited. If you have received this > communication in error, please notify us immediately by e-mail, and > delete the original message.
Re: Hadoop and Cuda , JCuda (CPU+GPU architecture)
Hi Sudha Good question. First of all, you need to specify clearly about your Hadoop environment, (pseudo distributed or real cluster) Secondly, you need to clearly understand how hadoop load job's jar file to all worker nodes, it only copy the jar file to worker nodes. It does not contain the jcuda.jar file. MapReduce program may not know where it is even you specify the jcuda.jar file in our worker node classpath. I prefer you can include the Jcuda.jar into your wordcount.jar. Then when Hadoop copy the wordcount.jar file to all worker nodes' temporary working directory, you do not need to worry about this issue. Let me know if you meet further question. Chen On Tue, Sep 25, 2012 at 12:38 AM, sudha sadhasivam < sudhasadhasi...@yahoo.com> wrote: > Sir > We tried to integrate hadoop and JCUDA. > We tried a code from > > > http://code.google.com/p/mrcl/source/browse/trunk/hama-mrcl/src/mrcl/mrcl/?r=76 > > We re able to compile. We are not able to execute. It does not recognise > JCUBLAS.jar. We tried setting the classpath > We are herewith attaching the procedure for the same along with errors > Kindly inform us how to proceed. It is our UG project > Thanking you > Dr G sudha Sadasivam > > --- On *Mon, 9/24/12, Chen He * wrote: > > > From: Chen He > Subject: Re: Hadoop and Cuda , JCuda (CPU+GPU architecture) > To: common-user@hadoop.apache.org > Date: Monday, September 24, 2012, 9:03 PM > > > http://wiki.apache.org/hadoop/CUDA%20On%20Hadoop > > On Mon, Sep 24, 2012 at 10:30 AM, Oleg Ruchovets > http://mc/compose?to=oruchov...@gmail.com> > >wrote: > > > Hi > > > > I am going to process video analytics using hadoop > > I am very interested about CPU+GPU architercute espessially using CUDA ( > > http://www.nvidia.com/object/cuda_home_new.html) and JCUDA ( > > http://jcuda.org/) > > Does using HADOOP and CPU+GPU architecture bring significant performance > > improvement and does someone succeeded to implement it in production > > quality? > > > > I didn't fine any projects / examples to use such technology. > > If someone could give me a link to best practices and example using > > CUDA/JCUDA + hadoop that would be great. > > Thanks in advane > > Oleg. > > > >
Re: Passing Command-line Parameters to the Job Submit Command
You could always write your own properties file and read it as resource. On Tue, Sep 25, 2012 at 12:10 AM, Hemanth Yamijala wrote: > By java environment variables, do you mean the ones passed as > -Dkey=value ? That's one way of passing them. I suppose another way is > to have a client side site configuration (like mapred-site.xml) that > is in the classpath of the client app. > > Thanks > Hemanth > > On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru wrote: > > Thanks Hemanth, > > > > But in general, if we want to pass arguments to any job (not only > > PiEstimator from examples-jar) and submit the Job to the Job queue > > scheduler, by the looks of it, we might always need to use the java > > environment variables only. > > > > Is my above assumption correct? > > > > Thanks, > > Varad > > > > On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala >wrote: > > > >> Varad, > >> > >> Looking at the code for the PiEstimator class which implements the > >> 'pi' example, the two arguments are mandatory and are used *before* > >> the job is submitted for execution - i.e on the client side. In > >> particular, one of them (nSamples) is used not by the MapReduce job, > >> but by the client code (i.e. PiEstimator) to generate some input. > >> > >> Hence, I believe all of this additional work that is being done by the > >> PiEstimator class will be bypassed if we directly use the job -submit > >> command. In other words, I don't think these two ways of running the > >> job: > >> > >> - using the "hadoop jar examples pi" > >> - using hadoop job -submit > >> > >> are equivalent. > >> > >> As a general answer to your question though, if additional parameters > >> are used by the Mappers or reducers, then they will generally be set > >> as additional job specific configuration items. So, one way of using > >> them with the job -submit command will be to find out the specific > >> names of the configuration items (from code, or some other > >> documentation), and include them in the job.xml used when submitting > >> the job. > >> > >> Thanks > >> Hemanth > >> > >> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru > wrote: > >> > Hi, > >> > > >> > I want to run the PiEstimator example from using the following command > >> > > >> > $hadoop job -submit pieestimatorconf.xml > >> > > >> > which contains all the info required by hadoop to run the job. E.g. > the > >> > input file location, the output file location and other details. > >> > > >> > > >> > mapred.jarfile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar > >> > mapred.map.tasks20 > >> > mapred.reduce.tasks2 > >> > ... > >> > mapred.job.name > >> PiEstimator > >> > > >> > mapred.output.dirfile:Users/varadmeru/Work/out > >> > > >> > Now, as we now, to run the PiEstimator, we can use the following > command > >> too > >> > > >> > $hadoop jar hadoop-examples.1.0.3 pi 5 10 > >> > > >> > where 5 and 10 are the arguments to the main class of the PiEstimator. > >> How > >> > can I pass the same arguments (5 and 10) using the job -submit command > >> > through conf. file or any other way, without changing the code of the > >> > examples to reflect the use of environment variables. > >> > > >> > Thanks in advance, > >> > Varad > >> > > >> > - > >> > Varad Meru > >> > Software Engineer, > >> > Business Intelligence and Analytics, > >> > Persistent Systems and Solutions Ltd., > >> > Pune, India. > >> >
RE: libhdfs install dep
Leo, yes I'm working with hadoop-1.0.1-1.amd64.rpm from Apache's download site. The rpm installs libhdfs in /usr/lib64 so I'm not sure why I would need the hadoop-<*>libhdfs* rpm. Any idea why the installed /usr/lib64/libhdfs.so is not detected by the package managers? Thanks, Rodrigo. -Original Message- From: Leo Leung [mailto:lle...@ddn.com] Sent: Tuesday, September 25, 2012 2:11 AM To: common-user@hadoop.apache.org Subject: RE: libhdfs install dep Rodrigo, Assuming you are asking for hadoop 1.x You are missing the hadoop-<*>libhdfs* rpm. Build it or get it from the vendor you got your hadoop from. -Original Message- From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com] Sent: Monday, September 24, 2012 8:20 PM To: 'core-u...@hadoop.apache.org' Subject: libhdfs install dep Anybody know why libhdfs.so is not found by package managers on CentOS 64 and OpenSuse64? I hava an rpm which declares Hadoop as a dependacy, but the package managers (KPackageKit, zypper, etc) report libhdfs.so as a missing dependency eventhough Hadoop has been installed via rpm package, and libhdfs.so is installed as well. Thanks, Rodrigo. - The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. This message may be an attorney-client communication and/or work product and as such is privileged and confidential. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message.
Re: Passing Command-line Parameters to the Job Submit Command
Building on Hemanth answer : at the end your variables should be in the job.xml (the second file needed with the jar to run a job). Building this job.xml can be done in various way but it does inherit from your local configuration and you can change it using the java API but at the end it is only a xml file so you are not hand tied. I know there is a job file that you can provide with the shell command : http://hadoop.apache.org/docs/r1.0.3/commands_manual.html#job But I haven't used it yet so I can tell you more about this option. Regards Bertrand On Tue, Sep 25, 2012 at 9:10 AM, Hemanth Yamijala wrote: > By java environment variables, do you mean the ones passed as > -Dkey=value ? That's one way of passing them. I suppose another way is > to have a client side site configuration (like mapred-site.xml) that > is in the classpath of the client app. > > Thanks > Hemanth > > On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru wrote: > > Thanks Hemanth, > > > > But in general, if we want to pass arguments to any job (not only > > PiEstimator from examples-jar) and submit the Job to the Job queue > > scheduler, by the looks of it, we might always need to use the java > > environment variables only. > > > > Is my above assumption correct? > > > > Thanks, > > Varad > > > > On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala >wrote: > > > >> Varad, > >> > >> Looking at the code for the PiEstimator class which implements the > >> 'pi' example, the two arguments are mandatory and are used *before* > >> the job is submitted for execution - i.e on the client side. In > >> particular, one of them (nSamples) is used not by the MapReduce job, > >> but by the client code (i.e. PiEstimator) to generate some input. > >> > >> Hence, I believe all of this additional work that is being done by the > >> PiEstimator class will be bypassed if we directly use the job -submit > >> command. In other words, I don't think these two ways of running the > >> job: > >> > >> - using the "hadoop jar examples pi" > >> - using hadoop job -submit > >> > >> are equivalent. > >> > >> As a general answer to your question though, if additional parameters > >> are used by the Mappers or reducers, then they will generally be set > >> as additional job specific configuration items. So, one way of using > >> them with the job -submit command will be to find out the specific > >> names of the configuration items (from code, or some other > >> documentation), and include them in the job.xml used when submitting > >> the job. > >> > >> Thanks > >> Hemanth > >> > >> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru > wrote: > >> > Hi, > >> > > >> > I want to run the PiEstimator example from using the following command > >> > > >> > $hadoop job -submit pieestimatorconf.xml > >> > > >> > which contains all the info required by hadoop to run the job. E.g. > the > >> > input file location, the output file location and other details. > >> > > >> > > >> > mapred.jarfile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar > >> > mapred.map.tasks20 > >> > mapred.reduce.tasks2 > >> > ... > >> > mapred.job.name > >> PiEstimator > >> > > >> > mapred.output.dirfile:Users/varadmeru/Work/out > >> > > >> > Now, as we now, to run the PiEstimator, we can use the following > command > >> too > >> > > >> > $hadoop jar hadoop-examples.1.0.3 pi 5 10 > >> > > >> > where 5 and 10 are the arguments to the main class of the PiEstimator. > >> How > >> > can I pass the same arguments (5 and 10) using the job -submit command > >> > through conf. file or any other way, without changing the code of the > >> > examples to reflect the use of environment variables. > >> > > >> > Thanks in advance, > >> > Varad > >> > > >> > - > >> > Varad Meru > >> > Software Engineer, > >> > Business Intelligence and Analytics, > >> > Persistent Systems and Solutions Ltd., > >> > Pune, India. > >> > -- Bertrand Dechoux
Re: Passing Command-line Parameters to the Job Submit Command
By java environment variables, do you mean the ones passed as -Dkey=value ? That's one way of passing them. I suppose another way is to have a client side site configuration (like mapred-site.xml) that is in the classpath of the client app. Thanks Hemanth On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru wrote: > Thanks Hemanth, > > But in general, if we want to pass arguments to any job (not only > PiEstimator from examples-jar) and submit the Job to the Job queue > scheduler, by the looks of it, we might always need to use the java > environment variables only. > > Is my above assumption correct? > > Thanks, > Varad > > On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala wrote: > >> Varad, >> >> Looking at the code for the PiEstimator class which implements the >> 'pi' example, the two arguments are mandatory and are used *before* >> the job is submitted for execution - i.e on the client side. In >> particular, one of them (nSamples) is used not by the MapReduce job, >> but by the client code (i.e. PiEstimator) to generate some input. >> >> Hence, I believe all of this additional work that is being done by the >> PiEstimator class will be bypassed if we directly use the job -submit >> command. In other words, I don't think these two ways of running the >> job: >> >> - using the "hadoop jar examples pi" >> - using hadoop job -submit >> >> are equivalent. >> >> As a general answer to your question though, if additional parameters >> are used by the Mappers or reducers, then they will generally be set >> as additional job specific configuration items. So, one way of using >> them with the job -submit command will be to find out the specific >> names of the configuration items (from code, or some other >> documentation), and include them in the job.xml used when submitting >> the job. >> >> Thanks >> Hemanth >> >> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru wrote: >> > Hi, >> > >> > I want to run the PiEstimator example from using the following command >> > >> > $hadoop job -submit pieestimatorconf.xml >> > >> > which contains all the info required by hadoop to run the job. E.g. the >> > input file location, the output file location and other details. >> > >> > >> mapred.jarfile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar >> > mapred.map.tasks20 >> > mapred.reduce.tasks2 >> > ... >> > mapred.job.name >> PiEstimator >> > >> mapred.output.dirfile:Users/varadmeru/Work/out >> > >> > Now, as we now, to run the PiEstimator, we can use the following command >> too >> > >> > $hadoop jar hadoop-examples.1.0.3 pi 5 10 >> > >> > where 5 and 10 are the arguments to the main class of the PiEstimator. >> How >> > can I pass the same arguments (5 and 10) using the job -submit command >> > through conf. file or any other way, without changing the code of the >> > examples to reflect the use of environment variables. >> > >> > Thanks in advance, >> > Varad >> > >> > - >> > Varad Meru >> > Software Engineer, >> > Business Intelligence and Analytics, >> > Persistent Systems and Solutions Ltd., >> > Pune, India. >>