Apache Mahout 0.11.2 Release Notes The Apache Mahout PMC is pleased to announce the release of Mahout 0.11.2. Mahout's goal is to create an environment for quickly creating machine learning applications that scale and run on the highest performance parallel computation engines available. Mahout comprises an interactive environment and library that supports generalized scalable linear algebra and includes many modern machine learning algorithms.
The Mahout Math environment we call “Samsara” for its symbol of universal renewal. It reflects a fundamental rethinking of how scalable machine learning algorithms are built and customized. Mahout-Samsara is here to help people create their own math while providing some off-the-shelf algorithm implementations. At its base are general linear algebra and statistical operations along with the data structures to support them. It’s written in Scala with Mahout-specific extensions, and runs on Spark and H2O. To get started with Apache Mahout 0.11.2, download the release artifacts and signatures from http://www.apache.org/dist/mahout/0.11.2/. Many thanks to the contributors and committers who were part of this release. Please see below for the Release Highlights. RELEASE HIGHLIGHTS This is a minor release over Mahout 0.11.0 meant to introduce major performance enhancements with sparse matrix and vector computations, and major performance optimizations to the Samsara DSL. Mahout 0.11.2 includes all new features and bug fixes released in Mahout versions 0.11.0 and 0.11.1. Mahout 0.11.2 new features compared to Mahout 0.11.1. 1. Spark 1.5.2 support. 2. Performance improvements of over 30% on Sparse Vector and Matrix computations leveraging the ‘fastutil’ library - contribution from Sebastiano Vigna. This speeds up all in-core sparse vector and matrix computations. 3. KNOWN ISSUES The dataset URLs in the Wikipedia Naive Bayes classification example script (/examples/bin/classify-wikipedia.sh) have changed. The new URL for the smallest set is: http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000030302.bz2 and for the medium set: http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles10.xml-p002336425p003046511.bz2 To run the Wikipedia classification example, simply switch out the old URLs with the new in classify-wikipedia.sh. Fixed Jiras: MAHOUT-1640: Better collections would significantly improve vector-operation speed MAHOUT-1800: Pare down Classtag overuse MAHOUT-1801: FastUtil to improve speed of Sparse Matrix Operations MAHOUT-1802: Capture attached checkpoints (if cached) Future Roadmap: 1. Mahout 0.12.0 will be released soon and would have Apache Flink as a supported backend execution engine. 2. Explore leveraging ViennaCL ( http://viennacl.sourceforge.net/doc/manual-license.html) as a math backend to support Dense, sparse and Cuda computations on bare metal.