Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-07 Thread Trevor Grant
e, and distributable data analysis functions. This topic presents a > curated list ... > > > > > ____________________ > From: Trevor Grant > Sent: Tuesday, February 7, 2017 8:47 AM > To: user@mahout.apache.org; isa...@apache.org > Subject: Re: Mahout ML

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-07 Thread Saikat Kanjilal
vides a set of over one hundred portable, scalable, and distributable data analysis functions. This topic presents a curated list ... From: Trevor Grant Sent: Tuesday, February 7, 2017 8:47 AM To: user@mahout.apache.org; isa...@apache.org Subject: Re: Mahout M

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-07 Thread Trevor Grant
The idea that Andy briefly touched on, is that the Algorithm Framework (hopefully) paves the way for R/CRAN like user contribution. Increased contribution was a goal I had certainly hoped for. I have begun promoting the idea at Meetups. There hasn't been a concerted effort to push the idea, howe

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-07 Thread Isabel Drost
On Wed, Feb 01, 2017 at 03:32:24PM -0800, Dmitriy Lyubimov wrote: > Isabel, if i understand it correctly, you are asking whether it makes sense > add end2end scenarios based on Samsara to current codebase? Sorry for being fuzzy. The meta question that I'm trying to find an answer for is if there's

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-07 Thread isa...@apache.org
Hi, On Wed, Feb 01, 2017 at 08:29:49PM +, Andrew Palumbo wrote: > I think that https://issues.apache.org/jira/browse/MAHOUT-1856 , a solid > framework for new algorithms will go A long way towards helping out new users > understand how easy it is to add algorithms. There has been significan

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-01 Thread Dmitriy Lyubimov
Isabel, if i understand it correctly, you are asking whether it makes sense add end2end scenarios based on Samsara to current codebase? The answer is, absolutely. Yes it does for both rather isolated issues (like computing clusters) and end-2-end scenarios. The only problem with end 2 end scenari

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-01 Thread Andrew Palumbo
From: Isabel Drost Sent: Wednesday, February 1, 2017 4:55 AM To: Dmitriy Lyubimov Cc: user@mahout.apache.org Subject: Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation On Tue, Jan 31, 2017 at 04:06:36PM -0800, Dmitriy Lyubimov wrote: > Except fo

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-01 Thread Isabel Drost
On Tue, Jan 31, 2017 at 04:06:36PM -0800, Dmitriy Lyubimov wrote: > Except for a several applied > off-the-shelves, Mahout has not (hopefully just yet) developed a > comprehensive set of things to use. Do you think there would be value in having that? Funding aside, would now be a good time to dev

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Dmitriy Lyubimov
On Tue, Jan 31, 2017 at 3:01 AM, Isabel Drost-Fromm wrote: > > Hi, > > > To give some advise to downstream users in the field - what would be your > advise > for people tasked with concrete use cases (stuff like fraud detection, > anomaly > detection, learning search ranking functions, building a

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Florent Empis
>From my point of view, mahout as a whole has shifted from what it was in 2009-2012: At the time, Mahout (and Mahout in Action is a great testimony of that era) was a sum of bricks, full of relatively high-level mathematics concepts but useable by what I'd call (myself included) wanna-be datascient

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Keith Aumiller
I was just watching it. ;) https://trevorgrant.org/ Thanks Trevor! On Tue, Jan 31, 2017 at 3:41 PM, scott cote wrote: > Trevor gave a great presentation at our user group. It was live streamed > on Periscope. Trevor - maybe you could share the url? I don’t have it > handy at the moment. > >

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread scott cote
Trevor gave a great presentation at our user group. It was live streamed on Periscope. Trevor - maybe you could share the url? I don’t have it handy at the moment. SCott > On Jan 31, 2017, at 8:50 AM, Trevor Grant wrote: > > Hello Isabel and Florent, > > I'm currently working on a side-by-

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Ted Dunning
>From my perspective, the state of the art of machine learning is with systems like Tensorflow and dl4j. If you can deal with the limits of a non-clustered GPU system, then Theano and Cafe are very useful. Keras papers over the difference between different back-ends nicely. Tensorflow and Theano c

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Pat Ferrel
My perspective comes from the data side. I work in recommenders and that means log analysis for huge amounts of data. Even a small shop doing this will immediately run our of the capacity in Python or R on a single node. MLlib is a set of prepackaged algorithms that will work (mostly) with big d

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Trevor Grant
Hello Isabel and Florent, I'm currently working on a side-by-side demo of R / Python / SparkML(Mllib) / Mahout, but in very broad strokes here is how I would compare them: R- Most statistical functionality. Most flexibility. Implement your own algorithms- mathematically expressive language. Wo

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Florent Empis
Hi, I am in the same spot as Isabel. Used to use/understand most of the «old» standalone mahout, now doing some data transformation with spark, but I am not sure where Samsara fits in the ecosystem. We also do quite a bit of computation in R. Basically we are willing to learn and support the proje

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Isabel Drost-Fromm
Hi, On Fri, Sep 16, 2016 at 11:36:03PM -0700, Andrew Musselman wrote: > and we're thinking about just how many pre-built algorithms we > should include in the library versus working on performance behind the > scenes. To pick this question up: I've been watching Mahout from a distance for quite

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2016-09-16 Thread Andrew Musselman
Mahout has changed a lot in the past couple years, becoming more focused on serving the needs of data workers and scientists who need to experiment with large matrix math problems. To that end we've broadened the execution engines that perform the distribution of computation to include Spark and Fl