Re: Giraph-13: Porting Giraph to YARN
Hi Vinod, Thank you for your thoughts. It would be great if your comments were put on GIRAPH-13 so they aren't lost. You and Jakob should sync up to see how to proceed on this. Avery On 9/18/11 7:37 AM, Vinod Kumar Vavilapalli wrote: Hi all, I finished an excursion into Giraph's code and now I kinda know what it takes to port Giraph over to run on top of YARN. When the base Hadoop clusters are replaced by YARN clusters, Giraph will have two options: - *Giraph still works over mapreduce APIs*: Even after moving to YARN clusters, Giraph can still run over MapreduceV2+YARN. Without any code changes at all. - *Giraph works natively onYARN*: This can be done in such a way that in the medium term, Giraph can continue to work on both a Hadoop Mapreduce cluster as well as a YARN cluster. Two visible effects when this effort goes underway, that I can think of: -- There will be some moving around of classes/interface to separate APIs from implementation details and a bit of reorganisation of code to help support both GiraphV1 and GiraphV2. -- The other thing the port will probably affect is a fork in the community's attention (depending on how much of the community's eyeballs the new world grabs as opposed to the stabilization/feature work on GiraphV1). Now here's the thing. Avery indicated on the other thread(about Giraph over HAMA) that most of the users of Giraph need to work on top of a hadoop mapreduce cluster for quite some time. Which I completely agree with, being a long time maintainer/supporting-dev of Hadoop clusters myself. Given that concern, before embarking on the port, I thought I'd get opinions from the community. Thanks, +Vinod
Re: Giraph-13: Porting Giraph to YARN
> - *Giraph still works over mapreduce APIs*: Even after moving to YARN > clusters, Giraph can still run over MapreduceV2+YARN. Without any code > changes at all. Giraph will continue to work with the MR1 APIs. > - *Giraph works natively onYARN*: This can be done in such a way that in > the medium term, Giraph can continue to work on both a Hadoop Mapreduce > cluster as well as a YARN cluster. Two visible effects when this effort goes > underway, that I can think of: As described in the JIRA, this is the approach I am taking as I do the work now. > -- There will be some moving around of classes/interface to separate > APIs from implementation details and a bit of reorganisation of code to help > support both GiraphV1 and GiraphV2. Yes. > -- The other thing the port will probably affect is a fork in the > community's attention (depending on how much of the community's eyeballs the > new world grabs as opposed to the stabilization/feature work on GiraphV1). Not really. Assuming the refactoring is done in a clean way, it'll be relatively painless to support both. > > Now here's the thing. Avery indicated on the other thread(about Giraph over > HAMA) that most of the users of Giraph need to work on top of a hadoop > mapreduce cluster for quite some time. Which I completely agree with, being > a long time maintainer/supporting-dev of Hadoop clusters myself. > > Given that concern, before embarking on the port, I thought I'd get opinions > from the community. I am also a Hadoop committer/dev and rest assured, Vinod, we'll ensure that Giraph plays nice with MR1 for the foreseeable future. The issue's assigned to me and I'll be working on it over the next few weeks.
Giraph-13: Porting Giraph to YARN
Hi all, I finished an excursion into Giraph's code and now I kinda know what it takes to port Giraph over to run on top of YARN. When the base Hadoop clusters are replaced by YARN clusters, Giraph will have two options: - *Giraph still works over mapreduce APIs*: Even after moving to YARN clusters, Giraph can still run over MapreduceV2+YARN. Without any code changes at all. - *Giraph works natively onYARN*: This can be done in such a way that in the medium term, Giraph can continue to work on both a Hadoop Mapreduce cluster as well as a YARN cluster. Two visible effects when this effort goes underway, that I can think of: -- There will be some moving around of classes/interface to separate APIs from implementation details and a bit of reorganisation of code to help support both GiraphV1 and GiraphV2. -- The other thing the port will probably affect is a fork in the community's attention (depending on how much of the community's eyeballs the new world grabs as opposed to the stabilization/feature work on GiraphV1). Now here's the thing. Avery indicated on the other thread(about Giraph over HAMA) that most of the users of Giraph need to work on top of a hadoop mapreduce cluster for quite some time. Which I completely agree with, being a long time maintainer/supporting-dev of Hadoop clusters myself. Given that concern, before embarking on the port, I thought I'd get opinions from the community. Thanks, +Vinod