Absolutely, I'd agree on that. >From what I can tell its the best "Pregel"-style clone going, its heading towards MRv2 and seems to have some decent momentum behind it.
On Thu, Oct 20, 2011 at 1:48 PM, Sebastian Schelter <[email protected]> wrote: > On 20.10.2011 19:45, Ted Dunning wrote: >> I think that giraph has a lot to offer here as well. > > +1 on that. > >> >> Sent from my iPhone >> >> On Oct 20, 2011, at 8:30, Josh Patterson <[email protected]> wrote: >> >>> I've run some tests with Spark in general, its a pretty interesting setup; >>> >>> I think the most interesting aspect (relevant to what you are asking >>> about) is that Matei already has Spark running on top of MRv2: >>> >>> https://github.com/mesos/spark-yarn >>> >>> (you dont have to run mesos, but the YARN code needs to be able to see >>> the jar in order to do its scheduling stuff) >>> >>> I've been playing around with writing a genetic algorithm in >>> Scala/Spark to run on MRv2, and in the process got introduced to the >>> book: >>> >>> "Parallel Iterative Algorithms, From Sequential to Grid Computing" >>> >>> which talks about strategies for parallelizing high iterative >>> algorithms and the inherent issues involved (sync/async iterations, >>> sync/async communications, etc). Since you can use Spark as a >>> "BSP-style" framework (ignoring the RRDs if you like) and just shoot >>> out slices of an array of items to be processed (relatively fast >>> compared to MR), it has some interesting property/tradeoffs to take a >>> look at. >>> >>> Toward the end of my ATL Hug talk I mentioned the possibility of how >>> MRv2 could be used with other frameworks, like Spark, to be better >>> suited for other algorithms (in this case, highly iterative): >>> >>> http://www.slideshare.net/jpatanooga/machine-learning-and-hadoop >>> >>> I think it would be interesting to have mahout sitting on top of MRv2, >>> like Ted is referring to, and then have an algorithm matched to a >>> framework on YARN and a workflow that mixed and matched these >>> combinations. >>> >>> Lot's of possibilities here. >>> >>> JP >>> >>> >>> On Wed, Oct 19, 2011 at 10:42 PM, Ted Dunning <[email protected]> wrote: >>>> Spark is very cool but very incompatible with Hadoop code. Many Mahout >>>> algorithms would run much faster on Spark, but you will have to do the >>>> porting yourself. >>>> >>>> Let us know how it turns how! >>>> >>>> 2011/10/19 WangRamon <[email protected]> >>>> >>>>> >>>>> >>>>> >>>>> >>>>> Hi All I was told today that Spark is a much better platform for cluster >>>>> computing, better than Hadoop at least at Recommendation computing way, >>>>> I'm >>>>> still very new at this area, if anyone has done some investigation on >>>>> Spark, >>>>> can you please share your idea here, thank you very much. Thanks Ramon >>>>> >>>> >>> >>> >>> >>> -- >>> Twitter: @jpatanooga >>> Solution Architect @ Cloudera >>> hadoop: http://www.cloudera.com > > -- Twitter: @jpatanooga Solution Architect @ Cloudera hadoop: http://www.cloudera.com
