I'm also convinced that Spark is a superior platform for executing distributed ML algorithms. We've had a discussion about a change from Hadoop to another platform some time ago, but at that point in time it was not clear which of the upcoming dataflow processing systems (Spark, Hyracks, Stratosphere) would establish itself amongst the users. To me it seems pretty obvious that Spark made the race.

I concur with Ted, it would be great to have the communities work together. I know that at least 4 mahout committers (including me) are already following Spark's mailinglist and actively participating in the discussions.

What are the ideas how a fruitful cooperation look like?

Best,
Sebastian

PS:

I ported LLR-based cooccurrence analysis (aka item-based recommendation) to Spark some time ago, but I haven't had time to test my code on a large dataset yet. I'd be happy to see someone help with that.





On 02/19/2014 08:04 AM, Nick Pentreath wrote:
I know the Spark/Mllib devs can occasionally be quite set in ways of doing 
certain things, but we'd welcome as many Mahout devs as possible to work 
together.


It may be too late, but perhaps a GSoC project to look at a port of some stuff 
like co occurrence recommender and streaming k-means?




N
—
Sent from Mailbox for iPhone

On Wed, Feb 19, 2014 at 3:02 AM, Ted Dunning <ted.dunn...@gmail.com>
wrote:

On Tue, Feb 18, 2014 at 1:58 PM, Nick Pentreath <nick.pentre...@gmail.com>wrote:
My (admittedly heavily biased) view is Spark is a superior platform overall
for ML. If the two communities can work together to leverage the strengths
of Spark, and the large amount of good stuff in Mahout (as well as the
fantastic depth of experience of Mahout devs) I think a lot can be
achieved!

It makes a lot of sense that Spark would be better than Hadoop for ML
purposes given that Hadoop was intended to do web-crawl kinds of things and
Spark was intentionally built to support machine learning.
Given that Spark has been announced by a majority of the Hadoop-based
distribution vendors, it makes sense that maybe Mahout should jump in.
I really would prefer it if the two communities (MLib/MLI and Mahout) could
work more closely together.  There is a lot of good to be had on both sides.

Reply via email to