Re: Running Mapred jobs after launching cluster

Tom White Fri, 28 Jan 2011 12:36:00 -0800

On Fri, Jan 28, 2011 at 12:06 PM,  <praveen.pe...@nokia.com> wrote:
> Thanks Tom. I think I got it working with my own driver so I will go with it 
> for now (unless that proves to be a bad option).
>
> BTW, could you tell me how to stick with one hadoop version while launching 
> cluster. I have hadoop-0.20.2 in my classpath but it lookws like Whirr gets 
> the latest hadoop from the repository. Since the latest version may be 
> different depending on the time, I would like to stick to one version so that 
> hadoop version mismatch won't happen.


You do need to make sure that the versions are the same. See the
Hadoop integration tests, which specify the version of Hadoop to use
in their POM.

>
> Also what jar files are necessary for launching cluster using Java. Currently 
> I have cli version of jar file but that's way too large since it has 
> ervrything in it.

You need Whirr's core and Hadoop jars, as well as their dependencies.
If you look at the POMs in the source code they will tell you the
dependencies.

Cheers
Tom

>
> Thanks
> Praveen
>
> -----Original Message-----
> From: ext Tom White [mailto:tom.e.wh...@gmail.com]
> Sent: Friday, January 28, 2011 2:12 PM
> To: whirr-user@incubator.apache.org
> Subject: Re: Running Mapred jobs after launching cluster
>
> On Fri, Jan 28, 2011 at 6:28 AM,  <praveen.pe...@nokia.com> wrote:
>> Thanks Tom. Could you eloborate little more on the second option.
>>
>> What is the HADOOP_CONF_DIR here, after launching the cluster?
>
> ~/.whirr/<cluster-name>
>
>> When you said run in new process, did you mean using command line Whirr tool?
>
> I meant that you could launch Whirr using the CLI, or Java. Then run the job 
> in another process, with HADOOP_CONF_DIR set.
>
> The MR jobs you are running I assume can be run against an arbitrary cluster, 
> so you should be able to point them at a cluster started by Whirr.
>
> Tom
>
>>
>> I may finally end up writing my own driver for running external mapred jobs 
>> so I can have more control but I was just curious to know if option #2 is 
>> better than writing my own driver.
>>
>> Praveen
>>
>> -----Original Message-----
>> From: ext Tom White [mailto:t...@cloudera.com]
>> Sent: Thursday, January 27, 2011 4:01 PM
>> To: whirr-user@incubator.apache.org
>> Subject: Re: Running Mapred jobs after launching cluster
>>
>> If they implement the Tool interface then you can set configuration on them. 
>> Failing that you could set HADOOP_CONF_DIR and run them in a new process.
>>
>> Cheers,
>> Tom
>>
>> On Thu, Jan 27, 2011 at 12:52 PM,  <praveen.pe...@nokia.com> wrote:
>>> Hmm...
>>> I am running some of the map reduce jobs written by me but some of them are 
>>> in external libraries (eg. Mahout) which I don't have control over. Since I 
>>> can't modify the code in external libraries, is there any other way to make 
>>> this work?
>>>
>>> Praveen
>>>
>>> -----Original Message-----
>>> From: ext Tom White [mailto:tom.e.wh...@gmail.com]
>>> Sent: Thursday, January 27, 2011 3:42 PM
>>> To: whirr-user@incubator.apache.org
>>> Subject: Re: Running Mapred jobs after launching cluster
>>>
>>> You don't need to add anything to the classpath, but you need to use the 
>>> configuration in the org.apache.whirr.service.Cluster object to populate 
>>> your Hadoop Configuration object so that your code knows which cluster to 
>>> connect to. See the getConfiguration() method in HadoopServiceController 
>>> for how to do this.
>>>
>>> Cheers,
>>> Tom
>>>
>>> On Thu, Jan 27, 2011 at 12:21 PM,  <praveen.pe...@nokia.com> wrote:
>>>> Hello all,
>>>> I wrote a java class HadoopLanucher that is very similar to
>>>> HadoopServiceController. I was succesfully able to launch a cluster
>>>> programtically from my application using Whirr. Now I want to copy
>>>> files to hdfs and also run a job progrmatically.
>>>>
>>>> When I copy a file to hdfs its copying to local file system, not hdfs.
>>>> Here is the code I used:
>>>>
>>>> Configuration conf = new Configuration(); FileSystem hdfs =
>>>> FileSystem.get(conf); hdfs.copyFromLocalFile(false, true, new
>>>> Path(localFilePath), new Path(hdfsFileDirectory));
>>>>
>>>> Do I need to add anything else to the classpath so Hadoop libraries
>>>> know that it needs to talk to the dynamically lanuched cluster? When
>>>> running Whirr from command line I know it uses HADOOP_CONF_DIR to
>>>> find the hadoop config files but when doing the same from Java I am
>>>> wondering how to solve this issue.
>>>>
>>>> Praveen
>>>>
>>>>
>>>
>>
>

Re: Running Mapred jobs after launching cluster

Reply via email to