Prakashan, Ralph mentioned to me before that the C API bindings will be available in Apache 2.0, which adds Google protocol buffers as one of the new features and thus supports non-Java HDFS bindings.
AFAIK, EMC MapR replaces HDFS with something that has more HA features & performance. I don't know all the specific details but I do believe that most of the API interfaces are going to be the same as or very similar to the existing HDFS APIs. Rayson On Mon, Jun 4, 2012 at 1:24 PM, Prakashan Korambath <[email protected]> wrote: > Hi Rayson, > > Let me know why you have C API bindings from Ralph ready. I can help you > guys with testing it out. > > Prakashan > > > On 06/04/2012 10:17 AM, Rayson Ho wrote: >> >> Hi Prakashan& Ron, >> >> I thought about this issue while I was writing& testing the HOWTO... >> >> but I didn't spend much more time on it as I needed to work on >> something else, and it requires an upcoming C API binding for HDFS >> from Ralph. Plus... I didn't want to pre-announce too many upcoming >> new features. :-) >> >> With the architecture of Prakashan's On-demand Hadoop Cluster, we can >> take advantage of Ralph's C HDFS API, and we can then easily write a >> scheduler plugin that queries HDFS block information. This scheduler >> plugin then affects scheduling decision such that Open Grid >> Scheduler/Grid Engine can send jobs to the data, which IMO is the core >> idea behind Hadoop - scheduling jobs& tasks to the data. >> >> >> Note that we will also need to productionize the "Parallel Environment >> Queue Sort (PQS) Scheduler API", which was under technology preview in >> GE 2011.11: >> >> http://gridscheduler.sourceforge.net/Releases/ReleaseNotesGE2011.11.pdf >> >> Rayson >> >> >> >> On Mon, Jun 4, 2012 at 12:55 PM, Prakashan Korambath<[email protected]> >> wrote: >>> >>> Hi Ron, >>> >>> I don't have anything planned beyond what I released right now. Idea is >>> to >>> let what Hadoop does best to Hadoop and what SGE or any scheduler does >>> best >>> to the scheduler. I believe somebody from SDSC also released similar >>> strategy for PBS/Torque. I worked only on the SGE because I mostly use >>> SGE. >>> >>> Prakashan >>> >>> >>> >>> On 06/04/2012 09:45 AM, Ron Chen wrote: >>>> >>>> >>>> Hi Prakashan, >>>> >>>> >>>> I am trying to understand your integration, and it looks like Ravi >>>> Chandra >>>> Nallan's Hadoop Integration. >>>> >>>> One of the improvements in Daniel Templeton's Hadoop Integration is he >>>> models HDFS data as resources, and thus can schedule jobs to data. Is >>>> scheduling jobs to data a planned feature of your "On-Demand Hadoop >>>> Cluster" >>>> integration? >>>> >>>> For those who didn't know Ravi Chandra Nallan, he was with Sun Micro >>>> when >>>> he developed the integration. Last I checked, he was with Oracle. >>>> >>>> -Ron >>>> >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> From: Rayson Ho<[email protected]> >>>> To: Prakashan Korambath<[email protected]> >>>> Cc: "[email protected]"<[email protected]> >>>> Sent: Friday, June 1, 2012 3:04 PM >>>> Subject: Re: [gridengine users] Hadoop Integration HOWTO (was: Hadoop >>>> Integration - how's it going) >>>> >>>> Thanks again Prakashan for the contribution! >>>> >>>> Rayson >>>> >>>> >>>> >>>> On Fri, Jun 1, 2012 at 1:25 PM, Prakashan Korambath<[email protected]> >>>> wrote: >>>>> >>>>> >>>>> Thank you Rayson! Appreciate you taking time and upload the tar files >>>>> and >>>>> writing the howto. >>>>> >>>>> Regards, >>>>> >>>>> Prakashan >>>>> >>>>> >>>>> >>>>> On 06/01/2012 10:19 AM, Rayson Ho wrote: >>>>>> >>>>>> >>>>>> >>>>>> I've reviewed the integration, and wrote a short Grid Engine Hadoop >>>>>> HOWTO: >>>>>> >>>>>> http://gridscheduler.sourceforge.net/howto/GridEngineHadoop.html >>>>>> >>>>>> The difference between the 2 methods (original SGE 6.2u5 vs >>>>>> Prakashan's) is that with Prakashan's approach, Grid Engine is used >>>>>> for resource allocation, and the Hadoop job scheduler/Job Tracker is >>>>>> used to handle all the MapReduce operations. A Hadoop cluster is >>>>>> created on demand with Prakashan's approach, but in the original SGE >>>>>> 6.2u5 method Grid Engine replaces the Hadoop job scheduler. >>>>>> >>>>>> As standard Grid Engine PEs are used in this new approach, one can >>>>>> call "qrsh -inherit" and use Grid Engine's method to start Hadoop >>>>>> services on remote nodes, and thus get full job control, job >>>>>> accounting, and cleanup at terminate benefits like any other tight PE >>>>>> jobs! >>>>>> >>>>>> Rayson >>>>>> >>>>>> >>>>>> >>>>>> On Tue, May 29, 2012 at 10:36 AM, Prakashan >>>>>> Korambath<[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I put my scripts in a tar file and send it to Rayson yesterday so >>>>>>> that >>>>>>> he >>>>>>> can put it in a common place to download. >>>>>>> >>>>>>> Prakashan >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 05/29/2012 07:18 AM, Jesse Becker wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, May 28, 2012 at 12:00:24PM -0400, Prakashan >>>>>>>> Korambath wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> This is how we run hadoop using Grid Engine (for that matter >>>>>>>>> any scheduler with appropriate alteration) >>>>>>>>> >>>>>>>>> http://www.ats.ucla.edu/clusters/hoffman2/hadoop/default.htm >>>>>>>>> >>>>>>>>> Basically, run either a prolog or call a script inside the >>>>>>>>> submission command file itself to parse the output of >>>>>>>>> PE_HOSTFILE to create hadoop *.site.xml, masters and slaves >>>>>>>>> files at run time. This methodology is suitable for any >>>>>>>>> scheduler as it is not dependent on them. If there is >>>>>>>>> interest I can post the prologue script. Thanks. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Please do. >>>>>>>> >>>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> [email protected] >>>> https://gridengine.org/mailman/listinfo/users >>>> >>> > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
