On 20.10.2011 19:45, Ted Dunning wrote: > I think that giraph has a lot to offer here as well.
+1 on that. > > Sent from my iPhone > > On Oct 20, 2011, at 8:30, Josh Patterson <[email protected]> wrote: > >> I've run some tests with Spark in general, its a pretty interesting setup; >> >> I think the most interesting aspect (relevant to what you are asking >> about) is that Matei already has Spark running on top of MRv2: >> >> https://github.com/mesos/spark-yarn >> >> (you dont have to run mesos, but the YARN code needs to be able to see >> the jar in order to do its scheduling stuff) >> >> I've been playing around with writing a genetic algorithm in >> Scala/Spark to run on MRv2, and in the process got introduced to the >> book: >> >> "Parallel Iterative Algorithms, From Sequential to Grid Computing" >> >> which talks about strategies for parallelizing high iterative >> algorithms and the inherent issues involved (sync/async iterations, >> sync/async communications, etc). Since you can use Spark as a >> "BSP-style" framework (ignoring the RRDs if you like) and just shoot >> out slices of an array of items to be processed (relatively fast >> compared to MR), it has some interesting property/tradeoffs to take a >> look at. >> >> Toward the end of my ATL Hug talk I mentioned the possibility of how >> MRv2 could be used with other frameworks, like Spark, to be better >> suited for other algorithms (in this case, highly iterative): >> >> http://www.slideshare.net/jpatanooga/machine-learning-and-hadoop >> >> I think it would be interesting to have mahout sitting on top of MRv2, >> like Ted is referring to, and then have an algorithm matched to a >> framework on YARN and a workflow that mixed and matched these >> combinations. >> >> Lot's of possibilities here. >> >> JP >> >> >> On Wed, Oct 19, 2011 at 10:42 PM, Ted Dunning <[email protected]> wrote: >>> Spark is very cool but very incompatible with Hadoop code. Many Mahout >>> algorithms would run much faster on Spark, but you will have to do the >>> porting yourself. >>> >>> Let us know how it turns how! >>> >>> 2011/10/19 WangRamon <[email protected]> >>> >>>> >>>> >>>> >>>> >>>> Hi All I was told today that Spark is a much better platform for cluster >>>> computing, better than Hadoop at least at Recommendation computing way, I'm >>>> still very new at this area, if anyone has done some investigation on >>>> Spark, >>>> can you please share your idea here, thank you very much. Thanks Ramon >>>> >>> >> >> >> >> -- >> Twitter: @jpatanooga >> Solution Architect @ Cloudera >> hadoop: http://www.cloudera.com
