Hi Team, Thanks for your replies, even if you consider the strong implementation of Recommendations and SVD in Mahout, I would still say that even in Spark 1.1.0 there is support for collaborative filtering (alternating least squares (ALS)) and under dimensionality reduction SVD and PCA. With fast pace contributions, I believe Spark may NOT be far away to have new and stable algorithms added to it (Like ANN, HMM etc and support for scientific libraries).
Ted, Even though Mahout (1.0) development code base support Scala and Spark bindings externally, Spark has this inbuilt support for Scala (as its been developed in Scala). And Numpy is a python based scientific library which need to be used for the support of Python based MLlib in Spark. Benefits are python is also supported in Spark for Python users. Major uniqueness of Mahout is, as Mahout is inherited from Lucene it has built-in support for Text processing. Ofcourse I do NOT believe its a strong point as I assume that, developers knowing Lucene can be able to easily use it with Spark through Java interface. Mahout currently stopped support for Hadoop (i.e., for further libraries) on the other hand Spark can re-use the data present in Hadoop/Hbase easily (May NOT be mapreduce functionality as Spark has its own computation layer). *As a user of Mahout since long time I strongly support Mahout (despite of poor visualization capabilities), at the same time, I am trying to understand if Spark continues to be evolved in MLLib package and being support for in-memory computation and with rich scientific libraries through Scala and support for languages like Java/Scala/Python will the survival of Mahout be questionable?* Best! Mahesh Balija. On Wed, Oct 22, 2014 at 1:26 PM, Martin, Nick <[email protected]> wrote: > I know we lost the maintainer for fpgrowth somewhere along the line but > it's definitely something I'd love to see carried forward, too. > > Sent from my iPhone > > > On Oct 22, 2014, at 8:09 AM, "Brian Dolan" <[email protected]> wrote: > > > > Sing it, brother! I miss FP Growth as well. Once the Scala bindings > are in, I'm hoping to work up some time series methods. > > > >> On Oct 21, 2014, at 8:00 PM, Lee S <[email protected]> wrote: > >> > >> As a developer, who is facing the library chosen between mahout and > mllib, > >> I have some idea below. > >> Mahout has no any decision tree algorithm. But MLLIB has the components > of > >> constructing a decision tree algorithm such as gini index, information > >> gain. And also I think mahout can add algorithm about frequency pattern > >> mining which is very import in feature selection and statistic analysis. > >> MLLIB has no frequent mining algorithms. > >> p.s Why fpgrowth algorithm is removed in version 0.9? > >> > >> 2014-10-22 9:12 GMT+08:00 Vibhanshu Prasad <[email protected]>: > >> > >>> actually spark is available in python also, so users of spark are > having an > >>> upper hand over users of traditional users of mahout. This is > applicable to > >>> all the libraries of python (including numpy). > >>> > >>> On Wed, Oct 22, 2014 at 3:54 AM, Ted Dunning <[email protected]> > >>> wrote: > >>> > >>>> On Tue, Oct 21, 2014 at 3:04 PM, Mahesh Balija < > >>> [email protected] > >>>> wrote: > >>>> > >>>>> I am trying to differentiate between Mahout and Spark, here is the > >>> small > >>>>> list, > >>>>> > >>>>> Features Mahout Spark Clustering Y Y Classification Y Y > >>> Regression Y > >>>>> Y Dimensionality Reduction Y Y Java Y Y Scala N Y Python N Y > >>> Numpy N > >>>>> Y Hadoop Y Y Text Mining Y N Scala/Spark Bindings Y N/A > >>> scalability Y > >>>>> Y > >>>> > >>>> Mahout doesn't actually have strong features for clustering, > >>> classification > >>>> and regression. Mahout is very strong in recommendations (which you > don't > >>>> mention) and dimensionality reduction. > >>>> > >>>> Mahout does support scala in the development version. > >>>> > >>>> What do you mean by support for Numpy? > > >
