Re: libhdfs install dep

2012-09-25 Thread Harsh J
I'd recommend using the packages for Apache Hadoop from Apache Bigtop
(https://cwiki.apache.org/confluence/display/BIGTOP). The ones
upstream (here) aren't maintained as much these days.

On Tue, Sep 25, 2012 at 6:27 PM, Pastrana, Rodrigo (RIS-BCT)
 wrote:
> Leo, yes I'm working with hadoop-1.0.1-1.amd64.rpm from Apache's download 
> site.
> The rpm installs libhdfs in /usr/lib64 so I'm not sure why I would need the 
> hadoop-<*>libhdfs* rpm.
>
> Any idea why the installed /usr/lib64/libhdfs.so is not detected by the 
> package managers?
>
> Thanks, Rodrigo.
>
> -Original Message-
> From: Leo Leung [mailto:lle...@ddn.com]
> Sent: Tuesday, September 25, 2012 2:11 AM
> To: common-user@hadoop.apache.org
> Subject: RE: libhdfs install dep
>
> Rodrigo,
>   Assuming you are asking for hadoop 1.x
>
>   You are missing the hadoop-<*>libhdfs* rpm.
>   Build it or get it from the vendor you got your hadoop from.
>
>
>
> -Original Message-
> From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com]
> Sent: Monday, September 24, 2012 8:20 PM
> To: 'core-u...@hadoop.apache.org'
> Subject: libhdfs install dep
>
> Anybody know why libhdfs.so is not found by package managers on CentOS 64 and 
> OpenSuse64?
>
> I hava an rpm which declares Hadoop as a dependacy, but the package managers 
> (KPackageKit, zypper, etc) report libhdfs.so as a missing dependency 
> eventhough Hadoop has been installed via rpm package, and libhdfs.so is 
> installed as well.
>
> Thanks, Rodrigo.
>
>
> -
> The information contained in this e-mail message is intended only
> for the personal and confidential use of the recipient(s) named
> above. This message may be an attorney-client communication and/or
> work product and as such is privileged and confidential. If the
> reader of this message is not the intended recipient or an agent
> responsible for delivering it to the intended recipient, you are
> hereby notified that you have received this document in error and
> that any review, dissemination, distribution, or copying of this
> message is strictly prohibited. If you have received this
> communication in error, please notify us immediately by e-mail, and
> delete the original message.



-- 
Harsh J


Re: Python + hdfs written thrift sequence files: lots of moving parts!

2012-09-25 Thread Jay Vyas
Thanks harsh: In any case, I'm really curious about how it is that sequence
file headers are formatted, as the documentation in the SequenceFile
javadocs seems to be very generic.

To make my questions more concrete:

1) I notice that the FileSplit class has a getStart() function.  It is
documented as returning the place to start "processing".  Does that imply
that a FileSplit does, or does not include a header?

http://hadoop.apache.org/docs/r0.20.2/api/org/apache/hadoop/mapreduce/lib/input/FileSplit.html#getStart%28%29

2) Also, Its not clear to me that how compression and serialization are
related.  These are two inticrately coupled aspects of HDFS file writing,
and im not sure what the idiom for coordinating the compression of records
to  the deserialization is.


Re: Python + hdfs written thrift sequence files: lots of moving parts!

2012-09-25 Thread Harsh J
Hi Jay,

This may be off-topic to you, but I feel its related: Use Avro
DataFiles. There's Python support already available, as well as
several other languages.

On Tue, Sep 25, 2012 at 10:57 PM, Jay Vyas  wrote:
> Hi guys!
>
> Im trying to read some hadoop outputted thrift files in plain old java
> (without using SequenceFile.Reader).  The reason for this is that I
>
> (1) want to understand the sequence file format better and
> (2) would like to be able to port this code to a language which doesnt have
> robust hadoop sequence file i/o / thrift support  (python). My code looks
> like this:
>
> So, before reading forward, if anyone has :
>
> 1) Some general hints on how to create a Sequence file with thrift encoded
> key values in python would be very useful.
> 2) Some tips on the generic approach for reading a sequencefile (the
> comments seem to be a bit underspecified in the SequenceFile header)
>
> I'd appreciate it!
>
> Now, here is my adventure into thrift/hdfs sequence file i/o :
>
> I've written a simple stub which , I think, should be the start of a
> sequence file reader (just tries to skip the header and get straight to the
> data).
>
> But it doesnt handle compression.
>
> http://pastebin.com/vyfgjML9
>
> So, this code ^^ appears to fail with cryptic errors : "don't know what
> type: 15".
>
> This error comes from a case statement, which attempts to determine what
> type of thrift record is being read in:
> "fail 127 don't know what type: 15"
>
>   private byte getTType(byte type) throws TProtocolException {
> switch ((byte)(type & 0x0f)) {
>   case TType.STOP:
> return TType.STOP;
>   case Types.BOOLEAN_FALSE:
>   case Types.BOOLEAN_TRUE:
> return TType.BOOL;
>  
>  case Types.STRUCT:
> return TType.STRUCT;
>   default:
> throw new TProtocolException("don't know what type: " +
> (byte)(type & 0x0f));
> }
>
> Upon further investigation, I have found that, in fact, the Configuration
> object is (of course) heavily utilized by the SequenceFile reader, in
> particular, to
> determine the Codec.  That corroborates my hypothesis that the data needs
> to be decompressed or decoded before it can be deserialized by thrift, I
> believe.
>
> So... I guess what Im assuming is missing here, is that I don't know how to
> manually reproduce the Codec/GZip, etc.. logic inside of
> SequenceFile.Reader in plain old java (i.e without cheating and using the
> SequenceFile.Reader class that is configured in our mapreduce soruce
> code).
>
> With my end goal being to read the file in python, I think it would be nice
> to be able to read the sequencefile in java, and use this as a template
> (since I know that my thrift objects and serialization are working
> correctly in my current java source codebase, when read in from
> SequenceFile.Reader api).
>
> Any suggestions on how I can distill the logic of the SequenceFile.Reader
> class into a simplified version which is specific to my data, so that I can
> start porting into a python script which is capable of scanning a few real
> sequencefiles off of HDFS would be much appreciated !!!
>
> In general... what are the core steps for doing i/o with sequence files
> that are compressed and or serialized in different formats?  Do we
> decompress first , and then deserialize?  Or do them both at the same time
> ?  Thanks!
>
> PS I've added an issue to github here
> https://github.com/matteobertozzi/Hadoop/issues/5, for a python
> SequenceFile reader.  If I get some helpful hints on this thread maybe I
> can directly implement an example on matteobertozzi's python hadoop trunk.
>
> --
> Jay Vyas
> MMSB/UCHC



-- 
Harsh J


RE: libhdfs install dep

2012-09-25 Thread Leo Leung
I see,
   The apache RPM distro does include the libhdfs* binaries. (And it is 
self-contained)

   For the Apache Distro (once installed) Yum,zipper,KPack*  only knows about  
hadoop-1.0.1-1.amd* as the provider for the  libhdfs
  (check with rpm -qlp hadoop.rpm -- this will give you all of the 
files it has/installs or yum deplist )

  So the package that you are trying to install, which is failing the 
dependency,  has a requirement for
  something name "hadoop" (per your e-mail) as a package that contains the 
libhdfs.  (This is not what hadoop-1.0.1-1 claims)

  To solve this:   The brute's way
  Force the install of the rpm without dependencies.  
  rpm -i --nodeps .rpm

  Another way.
Match up all your RPM package dependencies from the vendor(s)
  1) Talk to the .rpm  provider,  to see if they have the proper 
packaging for the apache RPM distro.
  2) Use or find a hadoop distro from [a] vendor that has all the proper 
"package" naming resolution. 

Good luck

-Original Message-
From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com] 
Sent: Tuesday, September 25, 2012 5:58 AM
To: common-user@hadoop.apache.org
Subject: RE: libhdfs install dep

Leo, yes I'm working with hadoop-1.0.1-1.amd64.rpm from Apache's download site.
The rpm installs libhdfs in /usr/lib64 so I'm not sure why I would need the 
hadoop-<*>libhdfs* rpm.

Any idea why the installed /usr/lib64/libhdfs.so is not detected by the package 
managers?

Thanks, Rodrigo.

-Original Message-
From: Leo Leung [mailto:lle...@ddn.com]
Sent: Tuesday, September 25, 2012 2:11 AM
To: common-user@hadoop.apache.org
Subject: RE: libhdfs install dep

Rodrigo,
  Assuming you are asking for hadoop 1.x

  You are missing the hadoop-<*>libhdfs* rpm.
  Build it or get it from the vendor you got your hadoop from.

 

-Original Message-
From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com]
Sent: Monday, September 24, 2012 8:20 PM
To: 'core-u...@hadoop.apache.org'
Subject: libhdfs install dep

Anybody know why libhdfs.so is not found by package managers on CentOS 64 and 
OpenSuse64? 

I hava an rpm which declares Hadoop as a dependacy, but the package managers 
(KPackageKit, zypper, etc) report libhdfs.so as a missing dependency eventhough 
Hadoop has been installed via rpm package, and libhdfs.so is installed as well. 

Thanks, Rodrigo.


-
The information contained in this e-mail message is intended only for the 
personal and confidential use of the recipient(s) named above. This message may 
be an attorney-client communication and/or work product and as such is 
privileged and confidential. If the reader of this message is not the intended 
recipient or an agent responsible for delivering it to the intended recipient, 
you are hereby notified that you have received this document in error and that 
any review, dissemination, distribution, or copying of this message is strictly 
prohibited. If you have received this communication in error, please notify us 
immediately by e-mail, and delete the original message.


Python + hdfs written thrift sequence files: lots of moving parts!

2012-09-25 Thread Jay Vyas
Hi guys!

Im trying to read some hadoop outputted thrift files in plain old java
(without using SequenceFile.Reader).  The reason for this is that I

(1) want to understand the sequence file format better and
(2) would like to be able to port this code to a language which doesnt have
robust hadoop sequence file i/o / thrift support  (python). My code looks
like this:

So, before reading forward, if anyone has :

1) Some general hints on how to create a Sequence file with thrift encoded
key values in python would be very useful.
2) Some tips on the generic approach for reading a sequencefile (the
comments seem to be a bit underspecified in the SequenceFile header)

I'd appreciate it!

Now, here is my adventure into thrift/hdfs sequence file i/o :

I've written a simple stub which , I think, should be the start of a
sequence file reader (just tries to skip the header and get straight to the
data).

But it doesnt handle compression.

http://pastebin.com/vyfgjML9

So, this code ^^ appears to fail with cryptic errors : "don't know what
type: 15".

This error comes from a case statement, which attempts to determine what
type of thrift record is being read in:
"fail 127 don't know what type: 15"

  private byte getTType(byte type) throws TProtocolException {
switch ((byte)(type & 0x0f)) {
  case TType.STOP:
return TType.STOP;
  case Types.BOOLEAN_FALSE:
  case Types.BOOLEAN_TRUE:
return TType.BOOL;
 
 case Types.STRUCT:
return TType.STRUCT;
  default:
throw new TProtocolException("don't know what type: " +
(byte)(type & 0x0f));
}

Upon further investigation, I have found that, in fact, the Configuration
object is (of course) heavily utilized by the SequenceFile reader, in
particular, to
determine the Codec.  That corroborates my hypothesis that the data needs
to be decompressed or decoded before it can be deserialized by thrift, I
believe.

So... I guess what Im assuming is missing here, is that I don't know how to
manually reproduce the Codec/GZip, etc.. logic inside of
SequenceFile.Reader in plain old java (i.e without cheating and using the
SequenceFile.Reader class that is configured in our mapreduce soruce
code).

With my end goal being to read the file in python, I think it would be nice
to be able to read the sequencefile in java, and use this as a template
(since I know that my thrift objects and serialization are working
correctly in my current java source codebase, when read in from
SequenceFile.Reader api).

Any suggestions on how I can distill the logic of the SequenceFile.Reader
class into a simplified version which is specific to my data, so that I can
start porting into a python script which is capable of scanning a few real
sequencefiles off of HDFS would be much appreciated !!!

In general... what are the core steps for doing i/o with sequence files
that are compressed and or serialized in different formats?  Do we
decompress first , and then deserialize?  Or do them both at the same time
?  Thanks!

PS I've added an issue to github here
https://github.com/matteobertozzi/Hadoop/issues/5, for a python
SequenceFile reader.  If I get some helpful hints on this thread maybe I
can directly implement an example on matteobertozzi's python hadoop trunk.

-- 
Jay Vyas
MMSB/UCHC


Re: libhdfs install dep

2012-09-25 Thread Brian Bockelman
Hi Rodrigo,

The hadoop RPMs are a bit deficient compared to those you would find from your 
Linux distribution.

For example, look at the Apache RPM you used:

[bbockelm@rcf-bockelman ~]$ rpm -qp 
http://mirrors.sonic.net/apache/hadoop/common/hadoop-1.0.3/hadoop-1.0.3-1.x86_64.rpm
 --provides
hadoop  
hadoop = 1.0.3-1

Normally, you would expect to see something like this (using the CDH4 
distribution as an example) as it contains a shared library:

[bbockelm@brian-test ~]$ rpm -q --provides hadoop-libhdfs
libhdfs.so.0()(64bit)  
hadoop-libhdfs = 2.0.0+88-1.cdh4.0.0.p0.30.osg.el5
libhdfs.so.0  
hadoop-libhdfs = 2.0.0+88-1.cdh4.0.0.p0.30.osg.el5

Because the Apache RPM does not list itself as providing libhdfs.so.0()(64bit), 
it breaks your automatic RPM dependency detection.

[Aside: I know from experience that building a high-quality (as in, follows the 
Fedora Packaging Guidelines) RPM for Java software is incredibly hard as the 
packaging approaches between the Linux distributions and Java community are 
incredibly divergent.  Not to say that the Java approach is inherently wrong, 
it's just different, and does not translate naturally to RPM.  Accordingly, to 
take Hadoop and make a rule-abiding RPM in Fedora would be hundreds of hours of 
work.  It's one of those things that appear to be much easier than it is to 
accomplish.]

The Hadoop community is very friendly, and I'm sure they would accept any 
patches to fix this oversight in future releases.

Brian

On Sep 25, 2012, at 7:57 AM, "Pastrana, Rodrigo (RIS-BCT)" 
 wrote:

> Leo, yes I'm working with hadoop-1.0.1-1.amd64.rpm from Apache's download 
> site.
> The rpm installs libhdfs in /usr/lib64 so I'm not sure why I would need the 
> hadoop-<*>libhdfs* rpm.
> 
> Any idea why the installed /usr/lib64/libhdfs.so is not detected by the 
> package managers?
> 
> Thanks, Rodrigo.
> 
> -Original Message-
> From: Leo Leung [mailto:lle...@ddn.com] 
> Sent: Tuesday, September 25, 2012 2:11 AM
> To: common-user@hadoop.apache.org
> Subject: RE: libhdfs install dep
> 
> Rodrigo,
>  Assuming you are asking for hadoop 1.x
> 
>  You are missing the hadoop-<*>libhdfs* rpm.
>  Build it or get it from the vendor you got your hadoop from.
> 
> 
> 
> -Original Message-
> From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com] 
> Sent: Monday, September 24, 2012 8:20 PM
> To: 'core-u...@hadoop.apache.org'
> Subject: libhdfs install dep
> 
> Anybody know why libhdfs.so is not found by package managers on CentOS 64 and 
> OpenSuse64? 
> 
> I hava an rpm which declares Hadoop as a dependacy, but the package managers 
> (KPackageKit, zypper, etc) report libhdfs.so as a missing dependency 
> eventhough Hadoop has been installed via rpm package, and libhdfs.so is 
> installed as well. 
> 
> Thanks, Rodrigo.
> 
> 
> -
> The information contained in this e-mail message is intended only
> for the personal and confidential use of the recipient(s) named
> above. This message may be an attorney-client communication and/or
> work product and as such is privileged and confidential. If the
> reader of this message is not the intended recipient or an agent
> responsible for delivering it to the intended recipient, you are
> hereby notified that you have received this document in error and
> that any review, dissemination, distribution, or copying of this
> message is strictly prohibited. If you have received this
> communication in error, please notify us immediately by e-mail, and
> delete the original message.



Re: Hadoop and Cuda , JCuda (CPU+GPU architecture)

2012-09-25 Thread Chen He
Hi Sudha

Good question.

First of all, you need to specify clearly about your Hadoop environment,
(pseudo distributed or real cluster)

Secondly, you need to clearly understand how hadoop load job's jar file to
all worker nodes, it only copy the jar file to worker nodes. It does not
contain the jcuda.jar file. MapReduce program may not know where it is even
you specify the jcuda.jar file in our worker node classpath.

I prefer you can include the Jcuda.jar into your wordcount.jar. Then when
Hadoop copy the wordcount.jar file to all worker nodes' temporary working
directory, you do not need to worry about this issue.

Let me know if you meet further question.

Chen

On Tue, Sep 25, 2012 at 12:38 AM, sudha sadhasivam <
sudhasadhasi...@yahoo.com> wrote:

> Sir
> We tried to integrate hadoop and JCUDA.
> We tried a code from
>
>
> http://code.google.com/p/mrcl/source/browse/trunk/hama-mrcl/src/mrcl/mrcl/?r=76
>
> We re able to compile. We are not able to execute. It does not recognise
> JCUBLAS.jar. We tried setting the classpath
> We are herewith attaching the procedure for the same along with errors
> Kindly inform us how to proceed. It is our UG project
> Thanking you
> Dr G sudha Sadasivam
>
> --- On *Mon, 9/24/12, Chen He * wrote:
>
>
> From: Chen He 
> Subject: Re: Hadoop and Cuda , JCuda (CPU+GPU architecture)
> To: common-user@hadoop.apache.org
> Date: Monday, September 24, 2012, 9:03 PM
>
>
> http://wiki.apache.org/hadoop/CUDA%20On%20Hadoop
>
> On Mon, Sep 24, 2012 at 10:30 AM, Oleg Ruchovets 
> http://mc/compose?to=oruchov...@gmail.com>
> >wrote:
>
> > Hi
> >
> > I am going to process video analytics using hadoop
> > I am very interested about CPU+GPU architercute espessially using CUDA (
> > http://www.nvidia.com/object/cuda_home_new.html) and JCUDA (
> > http://jcuda.org/)
> > Does using HADOOP and CPU+GPU architecture bring significant performance
> > improvement and does someone succeeded to implement it in production
> > quality?
> >
> > I didn't fine any projects / examples  to use such technology.
> > If someone could give me a link to best practices and example using
> > CUDA/JCUDA + hadoop that would be great.
> > Thanks in advane
> > Oleg.
> >
>
>


Re: Passing Command-line Parameters to the Job Submit Command

2012-09-25 Thread Mohit Anchlia
You could always write your own properties file and read it as resource.

On Tue, Sep 25, 2012 at 12:10 AM, Hemanth Yamijala wrote:

> By java environment variables, do you mean the ones passed as
> -Dkey=value ? That's one way of passing them. I suppose another way is
> to have a client side site configuration (like mapred-site.xml) that
> is in the classpath of the client app.
>
> Thanks
> Hemanth
>
> On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru  wrote:
> > Thanks Hemanth,
> >
> > But in general, if we want to pass arguments to any job (not only
> > PiEstimator from examples-jar) and submit the Job to the Job queue
> > scheduler, by the looks of it, we might always need to use the java
> > environment variables only.
> >
> > Is my above assumption correct?
> >
> > Thanks,
> > Varad
> >
> > On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala  >wrote:
> >
> >> Varad,
> >>
> >> Looking at the code for the PiEstimator class which implements the
> >> 'pi' example, the two arguments are mandatory and are used *before*
> >> the job is submitted for execution - i.e on the client side. In
> >> particular, one of them (nSamples) is used not by the MapReduce job,
> >> but by the client code (i.e. PiEstimator) to generate some input.
> >>
> >> Hence, I believe all of this additional work that is being done by the
> >> PiEstimator class will be bypassed if we directly use the job -submit
> >> command. In other words, I don't think these two ways of running the
> >> job:
> >>
> >> - using the "hadoop jar examples pi"
> >> - using hadoop job -submit
> >>
> >> are equivalent.
> >>
> >> As a general answer to your question though, if additional parameters
> >> are used by the Mappers or reducers, then they will generally be set
> >> as additional job specific configuration items. So, one way of using
> >> them with the job -submit command will be to find out the specific
> >> names of the configuration items (from code, or some other
> >> documentation), and include them in the job.xml used when submitting
> >> the job.
> >>
> >> Thanks
> >> Hemanth
> >>
> >> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru 
> wrote:
> >> > Hi,
> >> >
> >> > I want to run the PiEstimator example from using the following command
> >> >
> >> > $hadoop job -submit pieestimatorconf.xml
> >> >
> >> > which contains all the info required by hadoop to run the job. E.g.
> the
> >> > input file location, the output file location and other details.
> >> >
> >> >
> >>
> mapred.jarfile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar
> >> > mapred.map.tasks20
> >> > mapred.reduce.tasks2
> >> > ...
> >> > mapred.job.name
> >> PiEstimator
> >> >
> >>
> mapred.output.dirfile:Users/varadmeru/Work/out
> >> >
> >> > Now, as we now, to run the PiEstimator, we can use the following
> command
> >> too
> >> >
> >> > $hadoop jar hadoop-examples.1.0.3 pi 5 10
> >> >
> >> > where 5 and 10 are the arguments to the main class of the PiEstimator.
> >> How
> >> > can I pass the same arguments (5 and 10) using the job -submit command
> >> > through conf. file or any other way, without changing the code of the
> >> > examples to reflect the use of environment variables.
> >> >
> >> > Thanks in advance,
> >> > Varad
> >> >
> >> > -
> >> > Varad Meru
> >> > Software Engineer,
> >> > Business Intelligence and Analytics,
> >> > Persistent Systems and Solutions Ltd.,
> >> > Pune, India.
> >>
>


RE: libhdfs install dep

2012-09-25 Thread Pastrana, Rodrigo (RIS-BCT)
Leo, yes I'm working with hadoop-1.0.1-1.amd64.rpm from Apache's download site.
The rpm installs libhdfs in /usr/lib64 so I'm not sure why I would need the 
hadoop-<*>libhdfs* rpm.

Any idea why the installed /usr/lib64/libhdfs.so is not detected by the package 
managers?

Thanks, Rodrigo.

-Original Message-
From: Leo Leung [mailto:lle...@ddn.com] 
Sent: Tuesday, September 25, 2012 2:11 AM
To: common-user@hadoop.apache.org
Subject: RE: libhdfs install dep

Rodrigo,
  Assuming you are asking for hadoop 1.x

  You are missing the hadoop-<*>libhdfs* rpm.
  Build it or get it from the vendor you got your hadoop from.

 

-Original Message-
From: Pastrana, Rodrigo (RIS-BCT) [mailto:rodrigo.pastr...@lexisnexis.com] 
Sent: Monday, September 24, 2012 8:20 PM
To: 'core-u...@hadoop.apache.org'
Subject: libhdfs install dep

Anybody know why libhdfs.so is not found by package managers on CentOS 64 and 
OpenSuse64? 

I hava an rpm which declares Hadoop as a dependacy, but the package managers 
(KPackageKit, zypper, etc) report libhdfs.so as a missing dependency eventhough 
Hadoop has been installed via rpm package, and libhdfs.so is installed as well. 

Thanks, Rodrigo.


-
The information contained in this e-mail message is intended only
for the personal and confidential use of the recipient(s) named
above. This message may be an attorney-client communication and/or
work product and as such is privileged and confidential. If the
reader of this message is not the intended recipient or an agent
responsible for delivering it to the intended recipient, you are
hereby notified that you have received this document in error and
that any review, dissemination, distribution, or copying of this
message is strictly prohibited. If you have received this
communication in error, please notify us immediately by e-mail, and
delete the original message.


Re: Passing Command-line Parameters to the Job Submit Command

2012-09-25 Thread Bertrand Dechoux
Building on Hemanth answer : at the end your variables should be in the
job.xml (the second file needed with the jar to run a job). Building this
job.xml can be done in various way but it does inherit from your local
configuration and you can change it using the java API but at the end it is
only a xml file so you are not hand tied.

I know there is a job file that you can provide with the shell command :
http://hadoop.apache.org/docs/r1.0.3/commands_manual.html#job

But I haven't used it yet so I can tell you more about this option.

Regards

Bertrand

On Tue, Sep 25, 2012 at 9:10 AM, Hemanth Yamijala wrote:

> By java environment variables, do you mean the ones passed as
> -Dkey=value ? That's one way of passing them. I suppose another way is
> to have a client side site configuration (like mapred-site.xml) that
> is in the classpath of the client app.
>
> Thanks
> Hemanth
>
> On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru  wrote:
> > Thanks Hemanth,
> >
> > But in general, if we want to pass arguments to any job (not only
> > PiEstimator from examples-jar) and submit the Job to the Job queue
> > scheduler, by the looks of it, we might always need to use the java
> > environment variables only.
> >
> > Is my above assumption correct?
> >
> > Thanks,
> > Varad
> >
> > On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala  >wrote:
> >
> >> Varad,
> >>
> >> Looking at the code for the PiEstimator class which implements the
> >> 'pi' example, the two arguments are mandatory and are used *before*
> >> the job is submitted for execution - i.e on the client side. In
> >> particular, one of them (nSamples) is used not by the MapReduce job,
> >> but by the client code (i.e. PiEstimator) to generate some input.
> >>
> >> Hence, I believe all of this additional work that is being done by the
> >> PiEstimator class will be bypassed if we directly use the job -submit
> >> command. In other words, I don't think these two ways of running the
> >> job:
> >>
> >> - using the "hadoop jar examples pi"
> >> - using hadoop job -submit
> >>
> >> are equivalent.
> >>
> >> As a general answer to your question though, if additional parameters
> >> are used by the Mappers or reducers, then they will generally be set
> >> as additional job specific configuration items. So, one way of using
> >> them with the job -submit command will be to find out the specific
> >> names of the configuration items (from code, or some other
> >> documentation), and include them in the job.xml used when submitting
> >> the job.
> >>
> >> Thanks
> >> Hemanth
> >>
> >> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru 
> wrote:
> >> > Hi,
> >> >
> >> > I want to run the PiEstimator example from using the following command
> >> >
> >> > $hadoop job -submit pieestimatorconf.xml
> >> >
> >> > which contains all the info required by hadoop to run the job. E.g.
> the
> >> > input file location, the output file location and other details.
> >> >
> >> >
> >>
> mapred.jarfile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar
> >> > mapred.map.tasks20
> >> > mapred.reduce.tasks2
> >> > ...
> >> > mapred.job.name
> >> PiEstimator
> >> >
> >>
> mapred.output.dirfile:Users/varadmeru/Work/out
> >> >
> >> > Now, as we now, to run the PiEstimator, we can use the following
> command
> >> too
> >> >
> >> > $hadoop jar hadoop-examples.1.0.3 pi 5 10
> >> >
> >> > where 5 and 10 are the arguments to the main class of the PiEstimator.
> >> How
> >> > can I pass the same arguments (5 and 10) using the job -submit command
> >> > through conf. file or any other way, without changing the code of the
> >> > examples to reflect the use of environment variables.
> >> >
> >> > Thanks in advance,
> >> > Varad
> >> >
> >> > -
> >> > Varad Meru
> >> > Software Engineer,
> >> > Business Intelligence and Analytics,
> >> > Persistent Systems and Solutions Ltd.,
> >> > Pune, India.
> >>
>



-- 
Bertrand Dechoux


Re: Passing Command-line Parameters to the Job Submit Command

2012-09-25 Thread Hemanth Yamijala
By java environment variables, do you mean the ones passed as
-Dkey=value ? That's one way of passing them. I suppose another way is
to have a client side site configuration (like mapred-site.xml) that
is in the classpath of the client app.

Thanks
Hemanth

On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru  wrote:
> Thanks Hemanth,
>
> But in general, if we want to pass arguments to any job (not only
> PiEstimator from examples-jar) and submit the Job to the Job queue
> scheduler, by the looks of it, we might always need to use the java
> environment variables only.
>
> Is my above assumption correct?
>
> Thanks,
> Varad
>
> On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala wrote:
>
>> Varad,
>>
>> Looking at the code for the PiEstimator class which implements the
>> 'pi' example, the two arguments are mandatory and are used *before*
>> the job is submitted for execution - i.e on the client side. In
>> particular, one of them (nSamples) is used not by the MapReduce job,
>> but by the client code (i.e. PiEstimator) to generate some input.
>>
>> Hence, I believe all of this additional work that is being done by the
>> PiEstimator class will be bypassed if we directly use the job -submit
>> command. In other words, I don't think these two ways of running the
>> job:
>>
>> - using the "hadoop jar examples pi"
>> - using hadoop job -submit
>>
>> are equivalent.
>>
>> As a general answer to your question though, if additional parameters
>> are used by the Mappers or reducers, then they will generally be set
>> as additional job specific configuration items. So, one way of using
>> them with the job -submit command will be to find out the specific
>> names of the configuration items (from code, or some other
>> documentation), and include them in the job.xml used when submitting
>> the job.
>>
>> Thanks
>> Hemanth
>>
>> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru  wrote:
>> > Hi,
>> >
>> > I want to run the PiEstimator example from using the following command
>> >
>> > $hadoop job -submit pieestimatorconf.xml
>> >
>> > which contains all the info required by hadoop to run the job. E.g. the
>> > input file location, the output file location and other details.
>> >
>> >
>> mapred.jarfile:Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar
>> > mapred.map.tasks20
>> > mapred.reduce.tasks2
>> > ...
>> > mapred.job.name
>> PiEstimator
>> >
>> mapred.output.dirfile:Users/varadmeru/Work/out
>> >
>> > Now, as we now, to run the PiEstimator, we can use the following command
>> too
>> >
>> > $hadoop jar hadoop-examples.1.0.3 pi 5 10
>> >
>> > where 5 and 10 are the arguments to the main class of the PiEstimator.
>> How
>> > can I pass the same arguments (5 and 10) using the job -submit command
>> > through conf. file or any other way, without changing the code of the
>> > examples to reflect the use of environment variables.
>> >
>> > Thanks in advance,
>> > Varad
>> >
>> > -
>> > Varad Meru
>> > Software Engineer,
>> > Business Intelligence and Analytics,
>> > Persistent Systems and Solutions Ltd.,
>> > Pune, India.
>>