Re: [gridengine users] Hadoop Integration HOWTO (was: Hadoop Integration - how's it going)

Ralph Castain Wed, 13 Jun 2012 14:58:26 -0700

Sent from wrong account, so it bounced. Sorry for resending it.


On Jun 13, 2012, at 3:54 PM, Ralph Castain wrote:

> Here is my initial cut at the API.
> 
> typedef enum {
>    GRIDENGINE_RESOURCE_FILE,
>    GRIDENGINE_RESOURCE_PPN,
>    GRIDENGINE_RESOURCE_NUM_NODES
> } gridengine_resource_type_t;
> 
> typedef struct {
>    gridengine_resource_type_t type;
>    char *constraint;
> } gridengine_resources_t;
> 
> typedef void (*gridengine_allocation_cbfunc_t)(char *nodes);
> 
> int gridengine_allocate(gridengine_resources_t *array, int num_elements, 
> gridengine_allocation_cbfunc_t cbfunc);
> 
> The function call is non-blocking, returning 0 if the request was 
> successfully registered and < 0 if there was an error. The input params 
> include an array of structs that contain the specification, an integer 
> declaring how many elements are in the array, and the callback function to be 
> called once the allocation is available.
> 
> The callback function delivers a comma-delimited string of node names (the 
> receiver is responsible for releasing the string). The string is NULL if the 
> allocation could not be done - e.g., if one of the files cannot be located, 
> or we lack permission to access it.
> 
> We can extend supported constraints over time, but I could see three for now:
> 
> 1. ppn, indicating how many "slots" are required on each node. Default is 1.
> 
> 2. file, specifying a filename to be accessible by the application. The 
> filename will be given in URI form - I can see a couple of variants we need 
> supported for now:
> 
> (a) "hdfs://path-to-file" - need to get file shard locations from hdfs. There 
> are two ways to do this today - I've given Rayson copies of both methods. One 
> uses Java and works with all branches of HDFS. The other uses C and only 
> works with Hadoop 2.0 (branch-2 of the Apache Hadoop code base).
> 
> (b) "nfs://path" - network file system, should be available anywhere.
> 
> (c) "file://path" - a file on the local file system, ORTE will take 
> responsibility for copying it to the remote nodes.
> 
> I'm sure we'll think of others - note that (b) and (c) are equivalent to "no 
> constraint" on the allocator in terms of files. In such cases, you can 
> essentially ignore that constraint.
> 
> 3. num_nodes - specify the total number of nodes to be allocated. If ppn is 
> given and no num_nodes, then we want slots allocated on every node that meets 
> the rest of the constraints.
> 
> Comments welcome - this is just a first cut!
> Ralph
> 
> 
> 
> On Jun 12, 2012, at 12:29 PM, Ralph Castain wrote:
> 
>> I'll be happy to do so - I'm running behind as I've been hit with a high 
>> priority issue. I hope to get back to this in the not-too-distant future 
>> (probably August).
>> 
>> Version 2.0 is in alpha now - no announced schedule for release, but I would 
>> guess the August time frame again.
>> 
>> MR+ doesn't replace HDFS, but replaces the "Job" class underneath Hadoop MR 
>> so that it uses the OMPI infrastructure instead of YARN - this allows it to 
>> run on any HPC system, and to use MPI. So people can write their mappers and 
>> reducers with MPI calls in them (remember, OMPI now includes Java bindings!).
>> 
>> The current version of MR+ (in the OMPI trunk) let's you write and execute 
>> your own mappers and reducers, but does not support the Hadoop MR classes. 
>> So if you want to write something that uses your own algorithms, you can do 
>> so. There is a "word-count" example in the "examples" directory of the OMPI 
>> code base.
>> 
>> Ralph
>> 
>> 
>> On Jun 5, 2012, at 12:10 AM, Ron Chen wrote:
>> 
>>> Ralph, when you your C HDFS API ready, can you please let us know? And do 
>>> you know when Apache Hadoop is going to release version 2.0?
>>> 
>>> 
>>> -Ron
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Rayson Ho <[email protected]>
>>> To: Prakashan Korambath <[email protected]>
>>> Cc: Ralph Castain <[email protected]>; Ron Chen <[email protected]>; 
>>> "[email protected]" <[email protected]>
>>> Sent: Monday, June 4, 2012 1:53 PM
>>> Subject: Re: [gridengine users] Hadoop Integration HOWTO (was: Hadoop 
>>> Integration - how's it going)
>>> 
>>> Prakashan,
>>> 
>>> Ralph mentioned to me before that the C API bindings will be available
>>> in Apache 2.0, which adds Google protocol buffers as one of the new
>>> features and thus supports non-Java HDFS bindings.
>>> 
>>> AFAIK, EMC MapR replaces HDFS with something that has more HA features
>>> & performance. I don't know all the specific details but I do believe
>>> that most of the API interfaces are going to be the same as or very
>>> similar to the existing HDFS APIs.
>>> 
>>> Rayson
>>> 
>>> 
>>> 
>>> On Mon, Jun 4, 2012 at 1:24 PM, Prakashan Korambath <[email protected]> 
>>> wrote:
>>>> Hi Rayson,
>>>> 
>>>> Let me know why you have C API bindings from Ralph ready.  I can help you
>>>> guys with testing it out.
>>>> 
>>>> Prakashan
>>>> 
>>>> 
>>>> On 06/04/2012 10:17 AM, Rayson Ho wrote:
>>>>> 
>>>>> Hi Prakashan&  Ron,
>>>>> 
>>>>> I thought about this issue while I was writing&  testing the HOWTO...
>>>>> 
>>>>> but I didn't spend much more time on it as I needed to work on
>>>>> something else, and it requires an upcoming C API binding for HDFS
>>>>> from Ralph. Plus... I didn't want to pre-announce too many upcoming
>>>>> new features. :-)
>>>>> 
>>>>> With the architecture of Prakashan's On-demand Hadoop Cluster, we can
>>>>> take advantage of Ralph's C HDFS API, and we can then easily write a
>>>>> scheduler plugin that queries HDFS block information. This scheduler
>>>>> plugin then affects scheduling decision such that Open Grid
>>>>> Scheduler/Grid Engine can send jobs to the data, which IMO is the core
>>>>> idea behind Hadoop - scheduling jobs&  tasks to the data.
>>>>> 
>>>>> 
>>>>> Note that we will also need to productionize the "Parallel Environment
>>>>> Queue Sort (PQS) Scheduler API", which was under technology preview in
>>>>> GE 2011.11:
>>>>> 
>>>>> http://gridscheduler.sourceforge.net/Releases/ReleaseNotesGE2011.11.pdf
>>>>> 
>>>>> Rayson
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Jun 4, 2012 at 12:55 PM, Prakashan Korambath<[email protected]>
>>>>> wrote:
>>>>>> 
>>>>>> Hi Ron,
>>>>>> 
>>>>>> I don't have anything planned beyond what I released right now.  Idea is
>>>>>> to
>>>>>> let what Hadoop does best to Hadoop and what SGE or any scheduler does
>>>>>> best
>>>>>> to the scheduler.  I believe somebody from SDSC also released similar
>>>>>> strategy for PBS/Torque.  I worked only on the SGE because I mostly use
>>>>>> SGE.
>>>>>> 
>>>>>> Prakashan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 06/04/2012 09:45 AM, Ron Chen wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Prakashan,
>>>>>>> 
>>>>>>> 
>>>>>>> I am trying to understand your integration, and it looks like Ravi
>>>>>>> Chandra
>>>>>>> Nallan's Hadoop Integration.
>>>>>>> 
>>>>>>> One of the improvements in Daniel Templeton's Hadoop Integration is he
>>>>>>> models HDFS data as resources, and thus can schedule jobs to data. Is
>>>>>>> scheduling jobs to data a planned feature of your "On-Demand Hadoop
>>>>>>> Cluster"
>>>>>>> integration?
>>>>>>> 
>>>>>>> For those who didn't know Ravi Chandra Nallan, he was with Sun Micro
>>>>>>> when
>>>>>>> he developed the integration. Last I checked, he was with Oracle.
>>>>>>> 
>>>>>>> -Ron
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ----- Original Message -----
>>>>>>> From: Rayson Ho<[email protected]>
>>>>>>> To: Prakashan Korambath<[email protected]>
>>>>>>> Cc: "[email protected]"<[email protected]>
>>>>>>> Sent: Friday, June 1, 2012 3:04 PM
>>>>>>> Subject: Re: [gridengine users] Hadoop Integration HOWTO (was: Hadoop
>>>>>>> Integration - how's it going)
>>>>>>> 
>>>>>>> Thanks again Prakashan for the contribution!
>>>>>>> 
>>>>>>> Rayson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Jun 1, 2012 at 1:25 PM, Prakashan Korambath<[email protected]>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thank you Rayson!  Appreciate you taking time and upload the tar files
>>>>>>>> and
>>>>>>>> writing the howto.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> 
>>>>>>>> Prakashan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 06/01/2012 10:19 AM, Rayson Ho wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I've reviewed the integration, and wrote a short Grid Engine Hadoop
>>>>>>>>> HOWTO:
>>>>>>>>> 
>>>>>>>>> http://gridscheduler.sourceforge.net/howto/GridEngineHadoop.html
>>>>>>>>> 
>>>>>>>>> The difference between the 2 methods (original SGE 6.2u5 vs
>>>>>>>>> Prakashan's) is that with Prakashan's approach, Grid Engine is used
>>>>>>>>> for resource allocation, and the Hadoop job scheduler/Job Tracker is
>>>>>>>>> used to handle all the MapReduce operations. A Hadoop cluster is
>>>>>>>>> created on demand with Prakashan's approach, but in the original SGE
>>>>>>>>> 6.2u5 method Grid Engine replaces the Hadoop job scheduler.
>>>>>>>>> 
>>>>>>>>> As standard Grid Engine PEs are used in this new approach, one can
>>>>>>>>> call "qrsh -inherit" and use Grid Engine's method to start Hadoop
>>>>>>>>> services on remote nodes, and thus get full job control, job
>>>>>>>>> accounting, and cleanup at terminate benefits like any other tight PE
>>>>>>>>> jobs!
>>>>>>>>> 
>>>>>>>>> Rayson
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, May 29, 2012 at 10:36 AM, Prakashan
>>>>>>>>> Korambath<[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I put my scripts in a tar file and send it to Rayson yesterday so
>>>>>>>>>> that
>>>>>>>>>> he
>>>>>>>>>> can put it in a common place to download.
>>>>>>>>>> 
>>>>>>>>>> Prakashan
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 05/29/2012 07:18 AM, Jesse Becker wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, May 28, 2012 at 12:00:24PM -0400, Prakashan
>>>>>>>>>>> Korambath wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> This is how we run hadoop using Grid Engine (for that matter
>>>>>>>>>>>> any scheduler with appropriate alteration)
>>>>>>>>>>>> 
>>>>>>>>>>>> http://www.ats.ucla.edu/clusters/hoffman2/hadoop/default.htm
>>>>>>>>>>>> 
>>>>>>>>>>>> Basically, run either a prolog or call a script inside the
>>>>>>>>>>>> submission command file itself to parse the output of
>>>>>>>>>>>> PE_HOSTFILE to create hadoop *.site.xml, masters and slaves
>>>>>>>>>>>> files at run time. This methodology is suitable for any
>>>>>>>>>>>> scheduler as it is not dependent on them. If there is
>>>>>>>>>>>> interest I can post the prologue script. Thanks.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Please do.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> [email protected]
>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Hadoop Integration HOWTO (was: Hadoop Integration - how's it going)

Reply via email to