This is nice ! . With only problem one would have to learn a new paradigm. People have habit of sticking to what they are familiar with. -P
On Mon, Oct 31, 2011 at 4:39 PM, Nick Pentreath <[email protected]>wrote: > I have this crazy idea to combine Scalala (which aims to be a library > for linear algebra in Scala, based on netlib-java, that provides > Matlab / numpy like syntax and plotting), scalanlp (same developer as > Scalala, focused on NLP/ML algorithms), Spark and Mahout in some way, > to create a Matlab-like environment (or better an IPython-like > super-shell, that could also be integrated into a GUI) that allows you > to write code that seamlessly operates locally and across a Hadoop > cluster using Spark's framework. > > Ideally it would wrap / port Mahout's distributed matrix operations > (multiplication, SVD, other decompositions etc), as well as SGD and > some others etc, and integrate scalanlp's algorithms. It would be > seamless in the sense that calling, say, A * B, or SVD on a matrix in > local mode or cluster mode is exactly the same, save for setting > Spark's context to be local vs cluster (and specifying the HDFS > location of the data for cluster mode etc) - this is based on > Scalala's idea of optimised code paths depending on the matrix type. > This would allow rapid prototyping on a local machine / test cluster, > and deploying the exact same code across huge clusters... > > I don't have enough experience yet with Mahout, let alone Scala and > Scalala, to think about tackling this, but I wonder if this is > something people would like to see?! > > n > > On 20 Oct 2011, at 16:30, Josh Patterson <[email protected]> wrote: > > > I've run some tests with Spark in general, its a pretty interesting > setup; > > > > I think the most interesting aspect (relevant to what you are asking > > about) is that Matei already has Spark running on top of MRv2: > > > > https://github.com/mesos/spark-yarn > > > > (you dont have to run mesos, but the YARN code needs to be able to see > > the jar in order to do its scheduling stuff) > > > > I've been playing around with writing a genetic algorithm in > > Scala/Spark to run on MRv2, and in the process got introduced to the > > book: > > > > "Parallel Iterative Algorithms, From Sequential to Grid Computing" > > > > which talks about strategies for parallelizing high iterative > > algorithms and the inherent issues involved (sync/async iterations, > > sync/async communications, etc). Since you can use Spark as a > > "BSP-style" framework (ignoring the RRDs if you like) and just shoot > > out slices of an array of items to be processed (relatively fast > > compared to MR), it has some interesting property/tradeoffs to take a > > look at. > > > > Toward the end of my ATL Hug talk I mentioned the possibility of how > > MRv2 could be used with other frameworks, like Spark, to be better > > suited for other algorithms (in this case, highly iterative): > > > > http://www.slideshare.net/jpatanooga/machine-learning-and-hadoop > > > > I think it would be interesting to have mahout sitting on top of MRv2, > > like Ted is referring to, and then have an algorithm matched to a > > framework on YARN and a workflow that mixed and matched these > > combinations. > > > > Lot's of possibilities here. > > > > JP > > > > > > On Wed, Oct 19, 2011 at 10:42 PM, Ted Dunning <[email protected]> > wrote: > >> Spark is very cool but very incompatible with Hadoop code. Many Mahout > >> algorithms would run much faster on Spark, but you will have to do the > >> porting yourself. > >> > >> Let us know how it turns how! > >> > >> 2011/10/19 WangRamon <[email protected]> > >> > >>> > >>> > >>> > >>> > >>> Hi All I was told today that Spark is a much better platform for > cluster > >>> computing, better than Hadoop at least at Recommendation computing > way, I'm > >>> still very new at this area, if anyone has done some investigation on > Spark, > >>> can you please share your idea here, thank you very much. Thanks Ramon > >>> > >> > > > > > > > > -- > > Twitter: @jpatanooga > > Solution Architect @ Cloudera > > hadoop: http://www.cloudera.com >
