Re: [gridengine users] Hadoop integration

CB Tue, 06 Mar 2012 11:09:34 -0800

One reason we are experimenting with the Dan's hadoop integration with GE
is its ability to control how many nodes can be used for a given hadoop
MapReduce job.    This is important feature in an environment with many
users are vying for resources to run their Hadoop MapReduce jobs.


Regards,
- Chansup

On Tue, Mar 6, 2012 at 1:50 PM, Rayson Ho <[email protected]> wrote:

> On Tue, Mar 6, 2012 at 1:18 PM, Heywood, Todd <[email protected]> wrote:
> > Rayson, the point of your 2nd paragraph is good. I'd like to see dynamic
> > partitioning of a cluster between GridEngine and Hadoop nodes, where idle
> > nodes in each partition could be requisitioned on-the-fly for the other
> > partition. For the SGE side, you could just stop and start sgeexecd's,
> but
> > for the Hadoop side you would need to gracefully add and remove nodes
> from
> > Hadoop.
>
> I looked at Hadoop in Sept & Oct last year (and went to a Hadoop
> workshop with a friend in Nov - heck! It's almost 4 months ago!). I
> recall downloading an old version of Hadoop so that I could compile
> the examples in the books too.
>
> I agree that on the Hadoop side you can't just remove an HDFS node
> (unless you always remove less than 3 nodes and give HDFS enough time
> to do its job to rebuild the replicated data - and during the period
> make sure that hard disks and nodes don't fail :-D ).
>
> Even in the original SGE-Hadoop integration HDFS nodes are always
> running (or else the load sensors wouldn't be able to report the
> locality of data in HDFS). So SGE does not get a "full" node. HDFS
> does balancing in the background and in later versions there is also
> the HDFS snapshot feature that can use up processor cycles & I/O...
>
> So, partitioning is the cleaner approach, and I remember on the Hadoop
> dev mailing list there were discussions on partitioning a cluster such
> that Hadoop & other batch systems do not over commit the execution
> nodes.
>
> Rayson
>
>
>
>
> >
> > Todd
> >
> > -----Original Message-----
> > From: Rayson Ho <[email protected]>
> > Date: Tue, 6 Mar 2012 13:09:29 -0500
> > To: CB <[email protected]>
> > Cc: Todd Heywood <[email protected]>, "[email protected]"
> > <[email protected]>
> > Subject: Re: [gridengine users] Hadoop integration
> >
> >>Chansup,
> >>
> >>Did you need to change anything in GE2011.11 to integrate it with
> >>Hadoop?? I am finishing up GE2011.11 patch 1 (ie. GE2011.11 update-0
> >>patch-1), so if the changes are small and isolated, then I can quickly
> >>integrate them into the patch 1 release, or else I will just push them
> >>into patch 2 & GE2011.11 u1.
> >>
> >>
> >>Tood,
> >>
> >>The SGE-Hadoop integration uses Grid Engine as the job scheduler for
> >>Hadoop jobs, and the integration has the Herd JSV & load sensor that
> >>talk to HDFS to request & report data locality. There was a big API
> >>change in Hadoop 0.20.x for the Hadoop 1.0 release. I recall someone
> >>contributed a small patch that fixed things related to Hadoop, and
> >>that part is in GE 2011.11 already, but I don't recall changing any of
> >>the Java code in the GE2011.11 release for Hadoop.
> >>
> >>However, to be honest, using the SGE-Hadoop integration means that you
> >>need to give up the Hadoop job scheduler, and thus to get the full
> >>functionality of a normal Hadoop cluster, Grid Engine needs to
> >>implement all the features of the scheduler in Hadoop. For example, in
> >>the Hadoop scheduler supports "Speculative Execution" and Grid Engine
> >>does not have it.
> >>
> >>Rayson
> >>
> >>
> >>
> >>On Tue, Mar 6, 2012 at 12:53 PM, CB <[email protected]> wrote:
> >>> Hi Todd,
> >>>
> >>> I  have implemented a hadoop (0.20.2 version) integration with
> >>>OGE2011.11
> >>> release based on Dan T's work as described in the link below.  We are
> >>> experimenting the development cluster for internal projects.
> >>>
> >>> Dan T's hadoop module was built with hadoop 0.20.x release.  So it will
> >>> requires some changes in order to work with the latest hadoop 1.x
> >>>release.
> >>>  This is one of my ToDo list. :-)
> >>>
> >>> Regards,
> >>> - Chansup
> >>>
> >>>
> >>> On Tue, Mar 6, 2012 at 12:21 PM, Heywood, Todd <[email protected]>
> wrote:
> >>>>
> >>>> Yes. There also used to be something similar called Hadoop-on-Demand.
> >>>>
> >>>> But the idea is to schedule jobs to a persistent HDFS, sending jobs to
> >>>> where the data is, as opposed to setting up and tearing down HDFS for
> >>>> every job.
> >>>>
> >>>> I probably should have given this as background:
> >>>>
> >>>> https://blogs.oracle.com/templedf/entry/beta_testing_the_sun_grid
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: "Hung-Sheng Tsao (LaoTsao) Ph.D" <[email protected]>
> >>>> Date: Tue, 6 Mar 2012 12:12:06 -0500
> >>>> To: Todd Heywood <[email protected]>
> >>>> Cc: "[email protected]" <[email protected]>
> >>>> Subject: Re: [gridengine users] Hadoop integration
> >>>>
> >>>> >did you see this blog?
> >>>> >https://blogs.oracle.com/ravee/entry/creating_hadoop_pe_under_sge
> >>>> >
> >>>> >Sent from my iPad
> >>>> >
> >>>> >On Mar 6, 2012, at 11:45, "Heywood, Todd" <[email protected]> wrote:
> >>>> >
> >>>> >> Way back when SGE was still at Sun, Dan Templeton wrote a
> SGE-Hadoop
> >>>> >>integration for 6.2u5 (Sun's distribution as a value-added feature).
> >>>> >>
> >>>> >> I have been told that because of changes have been made to the
> >>>>Hadoop
> >>>> >>API since Oracle purchased Sun this integration no longer works - at
> >>>> >>least not in the open source versions following 6.2u5.
> >>>> >>
> >>>> >> Does anyone know if this is true? Has anyone worked with this
> >>>>recently?
> >>>> >>I do see a hadoop.tar.gz at the SoGE site
> >>>>
> >>>> >>
> >>>>>>http://arc.liv.ac.uk/downloads/SGE/releases/8.0.0d<
> http://arc.liv.ac.u
> >>>>>>k/d
> >>>> >>ownloads/SGE/releases/8.0.0d/>  but it looks to me like it is
> >>>>probably
> >>>> >>the 2-3 year old code from Sun (with no documentation since it was a
> >>>> >>value-added feature for Sun).
> >>>> >>
> >>>> >> Thanks,
> >>>> >>
> >>>> >> Todd Heywood
> >>>> >>
> >>>> >>
> >>>> >> _______________________________________________
> >>>> >> users mailing list
> >>>> >> [email protected]
> >>>> >> https://gridengine.org/mailman/listinfo/users
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> [email protected]
> >>>> https://gridengine.org/mailman/listinfo/users
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> [email protected]
> >>> https://gridengine.org/mailman/listinfo/users
> >>>
> >
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Hadoop integration

Reply via email to