Hi Rayson As I understand it, both of the referenced codes act as independent tools - i.e., one uses them to get an allocation, which you then use to run your job.
One of the things we would also like is the ability for mr+ to be able to request an allocation. In other words, a user would run "mr+ .... ./my_mapreduce", and mr+ would request the required allocation from SGE and then launch the job. This would require some kind of API that allowed us to specify the desired resources, or at least specify the files we are trying to access so that SGE could then internally request the corresponding locations from HDFS or whatever file system was given. Is that doable? I can help define the API, and possibly help with code, if that is of use. Ralph On May 24, 2012, at 8:34 AM, Rayson Ho wrote: > Just want to update everyone - I followed up with Ralph @ EMC, and I > looked at his code, which is very similar to DanT's code in SGE 6.2u5 > - ie. they both pull information from HDFS and use the locality info > to affect scheduling. > > However, the APIs used are different, and we will pay attention to the > Hadoop 2.x API changes and test DanT's integration again when 2.x > comes out. > > CB, can you let me know about the multi-user issue? As mentioned > before we have HBase, Pig, Hive, etc tested with our Hadoop setup, but > we don't have real users on it and thus it really would help if you > can let us know the issues you've encountered. > > Rayson > > > > On Fri, Mar 30, 2012 at 3:18 PM, CB <[email protected]> wrote: >> I'm very much interested in SGE + Hadoop enhancement. >> >> I'm currently testing Dan T's Hadoop + SGE integration for multi-user >> environment on an internal dev cluster and it's working nicely. >> But it is not easy to set up. It requires to change file permissions various >> places in order to make it working under multi-user environment. >> >> - Chansup >> >> On Fri, Mar 30, 2012 at 1:42 PM, Chris Dagdigian <[email protected]> wrote: >>> >>> >>> I'm registering my interest here. >>> >>> Reuti -- if you could pass my email along to Ralph I'd appreciate it. >>> >>> I have several consulting customers using EMC Isilon storage on Grid >>> Engine HPC clusters and we've been getting pinged from EMC/Greenplum sales >>> reps pushing to show off the combination of native HDFS support in Isilon + >>> the greenplum hadoop appliance integration. >>> >>> Basically I have a few largish sites that could test & provide feedback if >>> things work out. Some are commercial, some are .gov & all are interested in >>> SGE + Hadoop enhancements. >>> >>> -dag >>> >>> >>> >>> >>> >>> Reuti wrote: >>>> >>>> on behalf of Ralph Castain who you may know from the Open MPI mailing >>>> list I want to forward this eMail to your attention. >>>> >>>> -- Reuti >>>> >>>>>> I have a question for the Gridengine community, but thought I'd run >>>>>> it through you as I believe you work in that area? >>>>>> > As you may know, I am now employed by Greenplum/EMC to work on >>>>>> resource management for Hadoop as well as MPI. The main concern frankly >>>>>> is >>>>>> that the current Hadoop RM (yarn) scales poorly in terms of launch and >>>>>> provides no support for MPI wireup, thus causing MPI jobs to exhibit >>>>>> quadratic scaling of startup times. >>>>>> > The only reason for using yarn is that it has the HDFS interface >>>>>> required to determine file locality, thus allowing users to place >>>>>> processes >>>>>> network-near to the files they will use. I have initiated an effort here >>>>>> at >>>>>> GP to create a C-library for accessing HDFS to obtain that locality info, >>>>>> and expect to have it completed in the next few weeks. >>>>>> > Armed with that capability, it would be possible to extend more >>>>>> capable RMs such as Gridengine so that users could obtain HDFS-based >>>>>> allocations for their MapReduce applications. This would allow >>>>>> Gridengine to >>>>>> support Hadoop operations, and make Hadoop clusters that used Gridengine >>>>>> as >>>>>> their RM be "multi-use". >>>>>> > Would this be of interest to the community? I can contribute the >>>>>> C-lib code for their use under a BSD-like license structure, if that >>>>>> would >>>>>> help. >>>>>> > Regards, >>>>>> Ralph >>>>>> > >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> >> >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
