Re: [gridengine users] Hadoop Integration HOWTO (was: Hadoop Integration - how's it going)

Ron Chen Mon, 04 Jun 2012 09:51:26 -0700

Hi Prakashan,


I am trying to understand your integration, and it looks like Ravi Chandra 
Nallan's Hadoop Integration.

One of the improvements in Daniel Templeton's Hadoop Integration is he models 
HDFS data as resources, and thus can schedule jobs to data. Is scheduling jobs 
to data a planned feature of your "On-Demand Hadoop Cluster" integration?

For those who didn't know Ravi Chandra Nallan, he was with Sun Micro when he 
developed the integration. Last I checked, he was with Oracle.

 -Ron




----- Original Message -----
From: Rayson Ho <[email protected]>
To: Prakashan Korambath <[email protected]>
Cc: "[email protected]" <[email protected]>
Sent: Friday, June 1, 2012 3:04 PM
Subject: Re: [gridengine users] Hadoop Integration HOWTO (was: Hadoop 
Integration - how's it going)

Thanks again Prakashan for the contribution!

Rayson



On Fri, Jun 1, 2012 at 1:25 PM, Prakashan Korambath <[email protected]> wrote:
> Thank you Rayson!  Appreciate you taking time and upload the tar files and
> writing the howto.
>
> Regards,
>
> Prakashan
>
>
>
> On 06/01/2012 10:19 AM, Rayson Ho wrote:
>>
>> I've reviewed the integration, and wrote a short Grid Engine Hadoop HOWTO:
>>
>> http://gridscheduler.sourceforge.net/howto/GridEngineHadoop.html
>>
>> The difference between the 2 methods (original SGE 6.2u5 vs
>> Prakashan's) is that with Prakashan's approach, Grid Engine is used
>> for resource allocation, and the Hadoop job scheduler/Job Tracker is
>> used to handle all the MapReduce operations. A Hadoop cluster is
>> created on demand with Prakashan's approach, but in the original SGE
>> 6.2u5 method Grid Engine replaces the Hadoop job scheduler.
>>
>> As standard Grid Engine PEs are used in this new approach, one can
>> call "qrsh -inherit" and use Grid Engine's method to start Hadoop
>> services on remote nodes, and thus get full job control, job
>> accounting, and cleanup at terminate benefits like any other tight PE
>> jobs!
>>
>> Rayson
>>
>>
>>
>> On Tue, May 29, 2012 at 10:36 AM, Prakashan Korambath<[email protected]>
>>  wrote:
>>>
>>> I put my scripts in a tar file and send it to Rayson yesterday so that he
>>> can put it in a common place to download.
>>>
>>> Prakashan
>>>
>>>
>>>
>>> On 05/29/2012 07:18 AM, Jesse Becker wrote:
>>>>
>>>>
>>>> On Mon, May 28, 2012 at 12:00:24PM -0400, Prakashan
>>>> Korambath wrote:
>>>>>
>>>>>
>>>>>
>>>>> This is how we run hadoop using Grid Engine (for that matter
>>>>> any scheduler with appropriate alteration)
>>>>>
>>>>> http://www.ats.ucla.edu/clusters/hoffman2/hadoop/default.htm
>>>>>
>>>>> Basically, run either a prolog or call a script inside the
>>>>> submission command file itself to parse the output of
>>>>> PE_HOSTFILE to create hadoop *.site.xml, masters and slaves
>>>>> files at run time. This methodology is suitable for any
>>>>> scheduler as it is not dependent on them. If there is
>>>>> interest I can post the prologue script. Thanks.
>>>>
>>>>
>>>>
>>>> Please do.
>>>>
>>>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Hadoop Integration HOWTO (was: Hadoop Integration - how's it going)

Reply via email to