Chansup,

Did you need to change anything in GE2011.11 to integrate it with
Hadoop?? I am finishing up GE2011.11 patch 1 (ie. GE2011.11 update-0
patch-1), so if the changes are small and isolated, then I can quickly
integrate them into the patch 1 release, or else I will just push them
into patch 2 & GE2011.11 u1.


Tood,

The SGE-Hadoop integration uses Grid Engine as the job scheduler for
Hadoop jobs, and the integration has the Herd JSV & load sensor that
talk to HDFS to request & report data locality. There was a big API
change in Hadoop 0.20.x for the Hadoop 1.0 release. I recall someone
contributed a small patch that fixed things related to Hadoop, and
that part is in GE 2011.11 already, but I don't recall changing any of
the Java code in the GE2011.11 release for Hadoop.

However, to be honest, using the SGE-Hadoop integration means that you
need to give up the Hadoop job scheduler, and thus to get the full
functionality of a normal Hadoop cluster, Grid Engine needs to
implement all the features of the scheduler in Hadoop. For example, in
the Hadoop scheduler supports "Speculative Execution" and Grid Engine
does not have it.

Rayson



On Tue, Mar 6, 2012 at 12:53 PM, CB <[email protected]> wrote:
> Hi Todd,
>
> I  have implemented a hadoop (0.20.2 version) integration with OGE2011.11
> release based on Dan T's work as described in the link below.  We are
> experimenting the development cluster for internal projects.
>
> Dan T's hadoop module was built with hadoop 0.20.x release.  So it will
> requires some changes in order to work with the latest hadoop 1.x release.
>  This is one of my ToDo list. :-)
>
> Regards,
> - Chansup
>
>
> On Tue, Mar 6, 2012 at 12:21 PM, Heywood, Todd <[email protected]> wrote:
>>
>> Yes. There also used to be something similar called Hadoop-on-Demand.
>>
>> But the idea is to schedule jobs to a persistent HDFS, sending jobs to
>> where the data is, as opposed to setting up and tearing down HDFS for
>> every job.
>>
>> I probably should have given this as background:
>>
>> https://blogs.oracle.com/templedf/entry/beta_testing_the_sun_grid
>>
>>
>>
>>
>> -----Original Message-----
>> From: "Hung-Sheng Tsao (LaoTsao) Ph.D" <[email protected]>
>> Date: Tue, 6 Mar 2012 12:12:06 -0500
>> To: Todd Heywood <[email protected]>
>> Cc: "[email protected]" <[email protected]>
>> Subject: Re: [gridengine users] Hadoop integration
>>
>> >did you see this blog?
>> >https://blogs.oracle.com/ravee/entry/creating_hadoop_pe_under_sge
>> >
>> >Sent from my iPad
>> >
>> >On Mar 6, 2012, at 11:45, "Heywood, Todd" <[email protected]> wrote:
>> >
>> >> Way back when SGE was still at Sun, Dan Templeton wrote a SGE-Hadoop
>> >>integration for 6.2u5 (Sun's distribution as a value-added feature).
>> >>
>> >> I have been told that because of changes have been made to the Hadoop
>> >>API since Oracle purchased Sun this integration no longer works - at
>> >>least not in the open source versions following 6.2u5.
>> >>
>> >> Does anyone know if this is true? Has anyone worked with this recently?
>> >>I do see a hadoop.tar.gz at the SoGE site
>>
>> >> >>http://arc.liv.ac.uk/downloads/SGE/releases/8.0.0d<http://arc.liv.ac.uk/d
>> >>ownloads/SGE/releases/8.0.0d/>  but it looks to me like it is probably
>> >>the 2-3 year old code from Sun (with no documentation since it was a
>> >>value-added feature for Sun).
>> >>
>> >> Thanks,
>> >>
>> >> Todd Heywood
>> >>
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> [email protected]
>> >> https://gridengine.org/mailman/listinfo/users
>>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to