Re: [gridengine users] Gridengine and Hadoop

Ralph Castain Fri, 25 May 2012 11:44:53 -0700

On May 24, 2012, at 1:01 PM, Rayson Ho wrote:

> On Thu, May 24, 2012 at 1:58 PM, Ralph Castain <[email protected]> wrote:
>> In C - nobody wants to use Java on their clusters as many don't have it (as 
>> you note below) for both security and memory footprint.
>> 
>> I'm hoping to get everyone to the same API, but can work with it either way.
> 
> It's really good that it is now in C - but wasn't the C API not fully
> working when you emailed me last time? :-P


Yes, it has been unstable. However, it is being stabilized as Hadoop readies 
for the 2.0 release. So we should be okay to support that release, expected 
sometime this summer.

> 
> And I did not expect this dynamic allocation feature getting so big -
> last time I looked at your code it was similar in size (and
> functionality!) to the HDFS locality load sensor written by DanT (when
> he at Sun).

I'm not sure it is particularly large, but it does replace the HDFS locality 
load sensor (as well as the recent FileFinder code I sent you). The problem 
with running the allocator tools separately is that it introduces a potential 
time displacement between the allocation request and the execution of the 
application. When you are dealing with file systems that (a) have sparse 
locality, and (b) can move files to balance loads and other factors, then you 
really would prefer to have the allocation occur as close as possible to the 
execution time.

In addition, if you are going to support the Hadoop MR APIs, then you have to 
provide dynamic allocation as the files to be processed are *only* defined in 
the application itself. In other words, it is the application that identifies 
the files of interest, and so it is the application that is going to need to do 
the locality-aware allocation.


> 
> So if it is just the HDFSFileFinder.java & hdfsalloc.pl, then I
> believe we don't need to change DanT's code. But if we are planning
> for the dynamic allocation integration, then it really would help if
> we can look at the APIs used by others.

As I said, these are somewhat in definition at this time. I'll send some info 
on the current thoughts, but would welcome interaction as your thoughts/needs 
also are important in arriving at the conclusion.
Ralph


> 
> Rayson
> 
> 
> 
> 
> 
> 
> 
>> 
>>> 
>>> If the API binding is written in Java, I am interested to know if
>>> SLURM and Moab are calling the APIs in the job scheduler. While Grid
>>> Engine already has an embedded JVM for JMX, many sites don't enable
>>> JVM to save a bit of memory footprint.
>>> 
>>> Rayson
>>> 
>>> 
>>> 
>>>> 
>>>>> 
>>>>> I am wondering how the Hadoop job scheduler handles dynamic allocation
>>>>> - ie. the file request done inside the mapreduce job with async.
>>>>> callback.
>>>> 
>>>> Basically, you submit a request for a number of slots and then wait for 
>>>> the response in a blocking "poll". The response generally provides a 
>>>> partial allocation - i.e., you don't get everything you want as the Hadoop 
>>>> RM doles them out as each node contacts it to indicate its availability 
>>>> (which is why the launch takes so long). You then keep looping over the 
>>>> requests, updating the requested number of slots to reflect what you have 
>>>> already been given.
>>>> 
>>>> For MR, they launch your mapper against each slot as it is allocated, so 
>>>> you get a "rolling start". For MPI, we can't do that, so we have to wait 
>>>> until all resources have been allocated before we launch.
>>>> 
>>>> 
>>>>> 
>>>>> Rayson
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>> On May 24, 2012, at 8:34 AM, Rayson Ho wrote:
>>>>>> 
>>>>>>> Just want to update everyone - I followed up with Ralph @ EMC, and I
>>>>>>> looked at his code, which is very similar to DanT's code in SGE 6.2u5
>>>>>>> - ie. they both pull information from HDFS and use the locality info
>>>>>>> to affect scheduling.
>>>>>>> 
>>>>>>> However, the APIs used are different, and we will pay attention to the
>>>>>>> Hadoop 2.x API changes and test DanT's integration again when 2.x
>>>>>>> comes out.
>>>>>>> 
>>>>>>> CB, can you let me know about the multi-user issue? As mentioned
>>>>>>> before we have HBase, Pig, Hive, etc tested with our Hadoop setup, but
>>>>>>> we don't have real users on it and thus it really would help if you
>>>>>>> can let us know the issues you've encountered.
>>>>>>> 
>>>>>>> Rayson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Mar 30, 2012 at 3:18 PM, CB <[email protected]> wrote:
>>>>>>>> I'm very much interested in SGE + Hadoop enhancement.
>>>>>>>> 
>>>>>>>> I'm currently testing Dan T's Hadoop + SGE integration for multi-user
>>>>>>>> environment on an internal dev cluster and it's working nicely.
>>>>>>>> But it is not easy to set up. It requires to change file permissions 
>>>>>>>> various
>>>>>>>> places in order to make it working under multi-user environment.
>>>>>>>> 
>>>>>>>> - Chansup
>>>>>>>> 
>>>>>>>> On Fri, Mar 30, 2012 at 1:42 PM, Chris Dagdigian <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I'm registering my interest here.
>>>>>>>>> 
>>>>>>>>> Reuti -- if you could pass my email along to Ralph I'd appreciate it.
>>>>>>>>> 
>>>>>>>>> I have several consulting customers using EMC Isilon storage on Grid
>>>>>>>>> Engine HPC clusters and we've been getting pinged from EMC/Greenplum 
>>>>>>>>> sales
>>>>>>>>> reps pushing to show off the combination of native HDFS support in 
>>>>>>>>> Isilon +
>>>>>>>>> the greenplum hadoop appliance integration.
>>>>>>>>> 
>>>>>>>>> Basically I have a few largish sites that could test & provide 
>>>>>>>>> feedback if
>>>>>>>>> things work out. Some are commercial, some are .gov & all are 
>>>>>>>>> interested in
>>>>>>>>> SGE + Hadoop enhancements.
>>>>>>>>> 
>>>>>>>>> -dag
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Reuti wrote:
>>>>>>>>>> 
>>>>>>>>>> on behalf of Ralph Castain who you may know from the Open MPI mailing
>>>>>>>>>> list I want to forward this eMail to your attention.
>>>>>>>>>> 
>>>>>>>>>> -- Reuti
>>>>>>>>>> 
>>>>>>>>>>>>  I have a question for the Gridengine community, but thought I'd 
>>>>>>>>>>>> run
>>>>>>>>>>>> it through you as I believe you work in that area?
>>>>>>>>>>>>  >  As you may know, I am now employed by Greenplum/EMC to work on
>>>>>>>>>>>> resource management for Hadoop as well as MPI. The main concern 
>>>>>>>>>>>> frankly is
>>>>>>>>>>>> that the current Hadoop RM (yarn) scales poorly in terms of launch 
>>>>>>>>>>>> and
>>>>>>>>>>>> provides no support for MPI wireup, thus causing MPI jobs to 
>>>>>>>>>>>> exhibit
>>>>>>>>>>>> quadratic scaling of startup times.
>>>>>>>>>>>>  >  The only reason for using yarn is that it has the HDFS 
>>>>>>>>>>>> interface
>>>>>>>>>>>> required to determine file locality, thus allowing users to place 
>>>>>>>>>>>> processes
>>>>>>>>>>>> network-near to the files they will use. I have initiated an 
>>>>>>>>>>>> effort here at
>>>>>>>>>>>> GP to create a C-library for accessing HDFS to obtain that 
>>>>>>>>>>>> locality info,
>>>>>>>>>>>> and expect to have it completed in the next few weeks.
>>>>>>>>>>>>  >  Armed with that capability, it would be possible to extend more
>>>>>>>>>>>> capable RMs such as Gridengine so that users could obtain 
>>>>>>>>>>>> HDFS-based
>>>>>>>>>>>> allocations for their MapReduce applications. This would allow 
>>>>>>>>>>>> Gridengine to
>>>>>>>>>>>> support Hadoop operations, and make Hadoop clusters that used 
>>>>>>>>>>>> Gridengine as
>>>>>>>>>>>> their RM be "multi-use".
>>>>>>>>>>>>  >  Would this be of interest to the community? I can contribute 
>>>>>>>>>>>> the
>>>>>>>>>>>> C-lib code for their use under a BSD-like license structure, if 
>>>>>>>>>>>> that would
>>>>>>>>>>>> help.
>>>>>>>>>>>>  >  Regards,
>>>>>>>>>>>>  Ralph
>>>>>>>>>>>>  >
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> [email protected]
>>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> [email protected]
>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>> 
>>>> 
>> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Gridengine and Hadoop

Reply via email to