Hi Team, As Spark framework is gaining more attention in the Big Data - Open source frameworks, with the support for variety of applications like,
1) Shark 2) GraphX 3) MLLib 4) Streaming With the rapid development algorithms supporting Clustering, Classification, Regression etc in the MLlib package and inbuilt support for Scala. I am trying to differentiate between Mahout and Spark, here is the small list, Features Mahout Spark Clustering Y Y Classification Y Y Regression Y Y Dimensionality Reduction Y Y Java Y Y Scala N Y Python N Y Numpy N Y Hadoop Y Y Text Mining Y N Scala/Spark Bindings Y N/A scalability Y Y Apart from above, Mahout has vast coverage of Machine Learning algorithms with many utilities and API's as opposed to Spark. And Mahout 1.0 providing support for Scala, Spark bindings. I was trying to demarcate between Mahout and Spark? Can you throw some light on key differences and uniqueness of Mahout framework. Am I missing any important distinction which makes Mahout the only choice for Scalable machine learning. Best, Mahesh.B
