Chansup, Did you need to change anything in GE2011.11 to integrate it with Hadoop?? I am finishing up GE2011.11 patch 1 (ie. GE2011.11 update-0 patch-1), so if the changes are small and isolated, then I can quickly integrate them into the patch 1 release, or else I will just push them into patch 2 & GE2011.11 u1.
Tood, The SGE-Hadoop integration uses Grid Engine as the job scheduler for Hadoop jobs, and the integration has the Herd JSV & load sensor that talk to HDFS to request & report data locality. There was a big API change in Hadoop 0.20.x for the Hadoop 1.0 release. I recall someone contributed a small patch that fixed things related to Hadoop, and that part is in GE 2011.11 already, but I don't recall changing any of the Java code in the GE2011.11 release for Hadoop. However, to be honest, using the SGE-Hadoop integration means that you need to give up the Hadoop job scheduler, and thus to get the full functionality of a normal Hadoop cluster, Grid Engine needs to implement all the features of the scheduler in Hadoop. For example, in the Hadoop scheduler supports "Speculative Execution" and Grid Engine does not have it. Rayson On Tue, Mar 6, 2012 at 12:53 PM, CB <[email protected]> wrote: > Hi Todd, > > I have implemented a hadoop (0.20.2 version) integration with OGE2011.11 > release based on Dan T's work as described in the link below. We are > experimenting the development cluster for internal projects. > > Dan T's hadoop module was built with hadoop 0.20.x release. So it will > requires some changes in order to work with the latest hadoop 1.x release. > This is one of my ToDo list. :-) > > Regards, > - Chansup > > > On Tue, Mar 6, 2012 at 12:21 PM, Heywood, Todd <[email protected]> wrote: >> >> Yes. There also used to be something similar called Hadoop-on-Demand. >> >> But the idea is to schedule jobs to a persistent HDFS, sending jobs to >> where the data is, as opposed to setting up and tearing down HDFS for >> every job. >> >> I probably should have given this as background: >> >> https://blogs.oracle.com/templedf/entry/beta_testing_the_sun_grid >> >> >> >> >> -----Original Message----- >> From: "Hung-Sheng Tsao (LaoTsao) Ph.D" <[email protected]> >> Date: Tue, 6 Mar 2012 12:12:06 -0500 >> To: Todd Heywood <[email protected]> >> Cc: "[email protected]" <[email protected]> >> Subject: Re: [gridengine users] Hadoop integration >> >> >did you see this blog? >> >https://blogs.oracle.com/ravee/entry/creating_hadoop_pe_under_sge >> > >> >Sent from my iPad >> > >> >On Mar 6, 2012, at 11:45, "Heywood, Todd" <[email protected]> wrote: >> > >> >> Way back when SGE was still at Sun, Dan Templeton wrote a SGE-Hadoop >> >>integration for 6.2u5 (Sun's distribution as a value-added feature). >> >> >> >> I have been told that because of changes have been made to the Hadoop >> >>API since Oracle purchased Sun this integration no longer works - at >> >>least not in the open source versions following 6.2u5. >> >> >> >> Does anyone know if this is true? Has anyone worked with this recently? >> >>I do see a hadoop.tar.gz at the SoGE site >> >> >> >>http://arc.liv.ac.uk/downloads/SGE/releases/8.0.0d<http://arc.liv.ac.uk/d >> >>ownloads/SGE/releases/8.0.0d/> but it looks to me like it is probably >> >>the 2-3 year old code from Sun (with no documentation since it was a >> >>value-added feature for Sun). >> >> >> >> Thanks, >> >> >> >> Todd Heywood >> >> >> >> >> >> _______________________________________________ >> >> users mailing list >> >> [email protected] >> >> https://gridengine.org/mailman/listinfo/users >> >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
