I was just watching it. ;) https://trevorgrant.org/
Thanks Trevor! On Tue, Jan 31, 2017 at 3:41 PM, scott cote <scottcc...@gmail.com> wrote: > Trevor gave a great presentation at our user group. It was live streamed > on Periscope. Trevor - maybe you could share the url? I don’t have it > handy at the moment. > > SCott > > On Jan 31, 2017, at 8:50 AM, Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > > > > Hello Isabel and Florent, > > > > I'm currently working on a side-by-side demo of R / Python / > SparkML(Mllib) > > / Mahout, but in very broad strokes here is how I would compare them: > > > > R- Most statistical functionality. Most flexibility. Implement your own > > algorithms- mathematically expressive language. Worst performance- > handles > > only "small" data sets. Language is 'math centric'. Easy to extend / > > create new algos > > > > Python (sklearn/scikit) - Some mathematical / statistical functionality, > > more focused on machine learning. Machine learning library very > > sophisticated though. Much better performance than R, still only single > > node. "small to medium" data sets. Language is 'programmer centric'. > > Somewhat difficult to extend / create new algos > > > > SparkML / Mllib - Very Limited Mathematical functionality (usually > collects > > to driver to do anything of substance). Machine learning rudimentary > > compared to sklearn, but still non-trivial one of the best available. > > Exceeding performance, well suited to "big" data sets. Language is > > 'programmer centric'. Very difficult to extend / create new algos. > > > > (FlinkML - Fits in same spot as SparkML, but significantly less > developed) > > > > Mahout - Good mathematical functionality. Good performance relative to > > underlying engine (possibly superior with MAHOUT-1885). Language is > 'math > > centric'. Well suited to "medium and big" data sets. Fairly easy to > extend > > / create new algos (MAHOUT-1856) > > > > I hope that provides a high level comparison. > > > > Re use cases- the tool to use depends on the job at hand. > > Highly advanced mathematical model, small dataset or sampling from full > > dataset OK -> Use R > > Machine learning on small to medium data set or sampling from full > dataset > > OK -> Use Python / sklearn > > Less sophisticated machine learning on Large dataset -> SparkML > > Custom mathematical/statistical model on medium to large data -> Mahout > > > > ^^ All of this is just my opinion. > > > > Re: integration- > > > > We're working on that too. Recently MAHOUT-1896 added convenience > methods > > for interacting with MLLib type RDDs, and DataFrames > > https://issues.apache.org/jira/browse/MAHOUT-1896 > > > > (No support yet for SparkML type dataframes, or spitting DRMs back out > into > > RDDs/DataFrames). > > > > Finally Docs: There has been some talk for sometime of migrating the > > website from CMS to Jekyll and its something I strongly support. The CMS > > makes it difficult to keep up with documentation, and Jekyll would open > up > > documentation /website maintenance to contributors. > > > > Trevor Grant > > Data Scientist > > https://github.com/rawkintrevo > > http://stackexchange.com/users/3002022/rawkintrevo > > http://trevorgrant.org > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > On Tue, Jan 31, 2017 at 5:31 AM, Florent Empis <florent.em...@gmail.com> > > wrote: > > > >> Hi, > >> > >> I am in the same spot as Isabel. > >> Used to use/understand most of the «old» standalone mahout, now doing > some > >> data transformation with spark, but I am not sure where Samsara fits in > the > >> ecosystem. > >> We also do quite a bit of computation in R. > >> Basically we are willing to learn and support the project by for > instance > >> buying the books Rob mentioned, but a short doc with the outline Isabel > >> describes would be great! > >> > >> Many thanks, > >> > >> Florent > >> > >> > >> Le 31 janv. 2017 12:01, "Isabel Drost-Fromm" <isa...@apache.org> a > écrit : > >> > >> > >> Hi, > >> > >> On Fri, Sep 16, 2016 at 11:36:03PM -0700, Andrew Musselman wrote: > >>> and we're thinking about just how many pre-built algorithms we > >>> should include in the library versus working on performance behind the > >>> scenes. > >> > >> To pick this question up: I've been watching Mahout from a distance for > >> quite > >> some time. So from what limited background I have of Samsara I really > like > >> it's > >> approach to be able to run on more than one execution engine. > >> > >> To give some advise to downstream users in the field - what would be > your > >> advise > >> for people tasked with concrete use cases (stuff like fraud detection, > >> anomaly > >> detection, learning search ranking functions, building a recommender > >> system)? Is > >> that something that can still be done with Mahout? What would it take to > >> get > >> from raw data to finished system? Is there something we can do to help > >> users get > >> that accomplished? Is there even interest from users in such a use case > >> based > >> perspective? If so, would there be interest among the Mahout committers > to > >> help > >> users publicly create docs/examples/modules to support these use cases? > >> > >> > >> Isabel > >> > > -- Thanks, Keith Aumiller MBA - IT Professional Lafayette Hill PA 314-369-0811