The Apache Mahout PMC is pleased to announce the release of Mahout 0.12.0. Mahout's goal is to create an environment for quickly creating machine learning applications that scale and run on the highest performance parallel computation engines available. Mahout comprises an interactive environment and library that supports generalized scalable linear algebra and includes many modern machine learning algorithms.
The Mahout Math environment we call “Samsara” for its symbol of universal renewal. It reflects a fundamental rethinking of how scalable machine learning algorithms are built and customized. Mahout-Samsara is here to help people create their own math while providing some off-the-shelf algorithm implementations. At its base are general linear algebra and statistical operations along with the data structures to support them. It’s written in Scala with Mahout-specific extensions, and runs on Spark, Flink and H2O. The Mahout 0.12.0 release marks a major milestone for the “Samsara” environment’s goal of providing an engine neutral math platform by now supporting Flink. While still experimental, the mahout Flink bindings now offer all of the R-Like semantics for linear algebra operations, matrix decompositions, and algorithms of the “Samsara” platform for execution on a Flink back-end. This gives users of Flink out of the box access to the following features (and more): 1. The Mahout Distributed Row Matrix (DRM) API. 2. Distributed and local Vector and Matrix algebra routines. 3. Distributed and local Stochastic Principal Component Analysis. 4. Distributed and local Stochastic Singular Value Decomposition. 5. Distributed and local Thin QR Decomposition. 6. Collaborative Filtering. 7. Naive Bayes Classification. 8. Matrix operations (only listing a few here): 1. Mahout-native blockified distributed Matrix map and allreduce routines. 2. Distributed data point (row) sampling. 3. Matrix/Matrix Squared Distance. 4. Element-wise log. 5. Element-wise roots. 6. Element-wise Matrix/Matrix addition, subtraction, division and multiplication. 7. Functional Matrix value assignment. 9. A familiar Scala-based R-like DSL. As well as tools to develop other mathematical and machine learning algorithms. To get started with Apache Mahout 0.12.0, download the release artifacts and signatures from <http://www.apache.org/dist/mahout/0.11.2/> http://www.apache.org/dist/mahout/0.12.0/. Many thanks to the contributors and committers who were part of this release. Thanks in particular to Till Rohrmann, Alexey Grigorev, Robert Metzger, Stephan Ewen, and Kostas Tzoumas, members of Data Artisans and the Flink community who helped in this effort significantly. Please see below for the Release Highlights. RELEASE HIGHLIGHTS This is a major release over Mahout 0.11.2 meant to introduce Apache Flink ( http://flink.apache.org) as a backend execution engine to the Samsara Linear Algebra framework. For more information about “Samsara” on Flink see: ( http://mahout.apache.org/users/flinkbindings/flink-internals.html) and ( http://mahout.apache.org/users/flinkbindings/playing-with-samsara-flink.html ) Mahout 0.12.0 is based on Apache Flink 1.0.1 ( http://flink.apache.org/news/2016/04/06/release-1.0.1.html <http://flink.apache.org/news/2016/04/06/release-1.0.1.html)>) Mahout 0.12.0 now supports Flink 1.0.1 and Spark 1.5.2 on Hadoop 2.4.1. KNOWN ISSUES 1. Mahout’s DRM checkpointing is not fully supported in this release and the DrmLike.checkpoint(CacheHint.CacheHint) contract is broken. Currently checkpoints are cached to a temporary file system as designated by the `taskmanager.tmp.dirs` property in the `$MAHOUT_HOME/conf/flink-config.yaml` file. This Issue affects the performance of Mahout on Flink. 2. Serialization issues have arisen with certain operations. As the Flink Bindings are still experimental, we’ve allowed these issues to pass the release, and will be addressing them in a follow up 0.12.1 maintenance release. These issues affect the performance of Mahout on Flink. 3. Highly iterative Mahout algorithms are currently significantly slowed by issue (1). Fixed Jiras: This release addresses 35 issues [1] of which 14 are bug fixes [2]. Future Roadmap: 1. Mahout 0.12.1 will support a Flink shell. 2. Several optimizations will be made to the Mahout Flink-Bindings in Mahout 0.12.1, specifically to overcome the performance issues noted in the Known Issues section above. 3. We will be exploring native Mahout caching for Flink. 4. Explore leveraging ViennaCL ( http://viennacl.sourceforge.net/doc/manual-license.html) as a math backend to support Dense, sparse and Cuda computations on bare metal. [1] https://issues.apache.org/jira/browse/MAHOUT-1828?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%200.12.0%20AND%20status%20%3D%20Resolved [2] https://issues.apache.org/jira/browse/MAHOUT-1828?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%200.12.0%20AND%20Type%20%3D%20Bug%20AND%20status%20%3D%20Resolved [3]http://mahout.apache.org/users/flinkbindings/flink-internals.html [4] http://mahout.apache.org/users/flinkbindings/playing-with-samsara-flink.html Regards, On behalf of Apache Mahout PMC