Repository: incubator-systemml Updated Branches: refs/heads/master d48121217 -> edb9e7786
[SYSTEMML-1266] Replace README.txt In Release Package Closes #401. Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/edb9e778 Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/edb9e778 Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/edb9e778 Branch: refs/heads/master Commit: edb9e7786c6fe3a62dbc58a14f7e87aa9af8f67e Parents: d481212 Author: Glenn Weidner <gweid...@us.ibm.com> Authored: Mon Feb 20 10:36:46 2017 -0800 Committer: Glenn Weidner <gweid...@us.ibm.com> Committed: Mon Feb 20 10:36:46 2017 -0800 ---------------------------------------------------------------------- src/main/standalone/README.txt | 169 ++++++++++++++++++++++++++---------- 1 file changed, 121 insertions(+), 48 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/edb9e778/src/main/standalone/README.txt ---------------------------------------------------------------------- diff --git a/src/main/standalone/README.txt b/src/main/standalone/README.txt index af60940..024c679 100644 --- a/src/main/standalone/README.txt +++ b/src/main/standalone/README.txt @@ -1,65 +1,138 @@ -------------------------------------------------------------------------------- -Apache SystemML (incubating) -------------------------------------------------------------------------------- +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at -SystemML is now an Apache Incubator project! Please see the Apache SystemML -(incubating) website at http://systemml.apache.org/ for more information. The -latest project documentation can be found at the SystemML Documentation website -on GitHub at http://apache.github.io/incubator-systemml/. +http://www.apache.org/licenses/LICENSE-2.0 -SystemML is a flexible, scalable machine learning system. SystemML's -distinguishing characteristics are: +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. - 1. Algorithm customizability via R-like and Python-like languages. - 2. Multiple execution modes, including Standalone, Spark Batch, Spark - MLContext, Hadoop Batch, and JMLC. - 3. Automatic optimization based on data and cluster characteristics to ensure - both efficiency and scalability. +# Apache SystemML -------------------------------------------------------------------------------- -SystemML in Standalone Mode -------------------------------------------------------------------------------- +**Documentation:** [SystemML Documentation](http://apache.github.io/incubator-systemml/) +**Mailing List:** [Dev Mailing List](mailto:d...@systemml.incubator.apache.org) +**Build Status:** [![Build Status](https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/badge/icon)](https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest) +**Issue Tracker:** [JIRA](https://issues.apache.org/jira/browse/SYSTEMML) +**Download:** [Download SystemML](http://systemml.apache.org/download.html) -Standalone mode can be run on a single machine in a non-Hadoop environment, -allowing data scientists to develop algorithms locally without need of a -distributed cluster. The Standalone release packages all required libraries -into a single distribution. Standalone mode is not appropriate for large -datasets. +**SystemML** is now an **Apache Incubator** project! Please see the [**Apache SystemML (incubating)**](http://systemml.apache.org/) +website for more information. The latest project documentation can be found at the +[**SystemML Documentation**](http://apache.github.io/incubator-systemml/) website on GitHub. -OS X and Linux users can use the runStandaloneSystemML.sh script to run in -Standalone mode, while Windows users can use the runStandaloneSystemML.bat -script. +SystemML is a flexible, scalable machine learning system. +SystemML's distinguishing characteristics are: + 1. **Algorithm customizability via R-like and Python-like languages**. + 2. **Multiple execution modes**, including Spark MLContext API, Spark Batch, Hadoop Batch, Standalone, and JMLC. + 3. **Automatic optimization** based on data and cluster characteristics to ensure both efficiency and scalability. -------------------------------------------------------------------------------- -Hello World Example -------------------------------------------------------------------------------- -The following example will run a "hello world" DML script on SystemML in -Standalone mode. +## Algorithm Customizability -$ echo 'print("hello world");' > helloworld.dml -$ ./runStandaloneSystemML.sh helloworld.dml +ML algorithms in SystemML are specified in a high-level, declarative machine learning (DML) language. +Algorithms can be expressed in either an R-like syntax or a Python-like syntax. DML includes +linear algebra primitives, statistical functions, and additional constructs. +This high-level language significantly increases the productivity of +data scientists as it provides (1) full flexibility in expressing custom +analytics and (2) data independence from the underlying input formats and +physical data representations. -------------------------------------------------------------------------------- -Running SystemML Algorithms -------------------------------------------------------------------------------- -Several existing algorithms can be found in the scripts directory in the -Standalone distribution. In the following example, we first obtain Haberman's -Survival Data Set. We create a metadata file for this data. We create a -types.csv file that describes the type of each column along with a -corresponding metadata file. We then run the Univariate Statistics algorithm -on the data in Standalone mode. The results are output to the -data/univarOut.mtx file. +## Multiple Execution Modes -$ wget -P data/ http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data -$ echo '{"rows": 306, "cols": 4, "format": "csv"}' > data/haberman.data.mtd -$ echo '1,1,1,2' > data/types.csv -$ echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd -$ ./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx +SystemML computations can be executed in a variety of different modes. To begin with, SystemML +can be operated in Standalone mode on a single machine, allowing data scientists to develop +algorithms locally without need of a distributed cluster. In order to scale up, algorithms can also be distributed +across a cluster using Spark or Hadoop. +This flexibility allows the utilization of an organization's existing resources and expertise. +In addition, SystemML features a +[Spark MLContext API](http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html) +that allows for programmatic interaction via Scala, Python, and Java. SystemML also features an +embedded API for scoring models. -For more information, please see the online SystemML documentation. +## Automatic Optimization + +Algorithms specified in DML are dynamically compiled and optimized based on data and cluster characteristics +using rule-based and cost-based optimization techniques. The optimizer automatically generates hybrid runtime +execution plans ranging from in-memory, single-node execution, to distributed computations on Spark or Hadoop. +This ensures both efficiency and scalability. Automatic optimization reduces or eliminates the need to hand-tune +distributed runtime execution plans and system configurations. + +## ML Algorithms + +SystemML features a suite of production-level examples that can be grouped into six broad categories: +Descriptive Statistics, Classification, Clustering, Regression, Matrix Factorization, and Survival Analysis. +Detailed descriptions of these algorithms can be found in the +[SystemML Algorithms Reference](http://apache.github.io/incubator-systemml/algorithms-reference.html). The goal of these provided algorithms is to serve as production-level examples that can modified or used as inspiration for a new custom algorithm. + +## Download & Setup + +Before you get started on SystemML, make sure that your environment is set up and ready to go. + + 1. **If youâre on OS X, we recommend installing [Homebrew](http://brew.sh) if you havenât already. For Linux users, the [Linuxbrew project](http://linuxbrew.sh/) is equivalent.** + + OS X: + ``` + /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" + ``` + Linux: + ``` + ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install)" + ``` + + 2. **Install Java (need Java 8).** + ``` + brew tap caskroom/cask + brew install Caskroom/cask/java + ``` + + 3. **Install Spark 2.1.** + ``` + brew tap homebrew/versions + brew install apache-spark21 + ``` + + 4. **Download SystemML.** + + Go to the [SystemML Downloads page](http://systemml.apache.org/download.html), download `systemml-0.13.0-incubating.zip` (should be 2nd), and unzip it to a location of your choice. + + *The next step is optional, but it will make your life a lot easier.* + + 5. **[OPTIONAL] Set `SYSTEMML_HOME` in your bash profile.** + Add the following to `~/.bash_profile`, replacing `path/to/` with the location of the download in step 5. + ``` + export SYSTEMML_HOME=path/to/systemml-0.13.0-incubating + ``` + *Make sure to open a new tab in terminal so that you make sure the changes have been made.* + + 6. **[OPTIONAL] Install Python or Python 3 (to follow along with our Jupyter notebook examples).** + + Python 2: + ``` + brew install python + pip install jupyter matplotlib numpy + ``` + + Python 3: + ``` + brew install python3 + pip3 install jupyter matplotlib numpy + ``` + +**Congrats! You can now use SystemML!** + +## Next Steps! + +To get started, please consult the +[SystemML Documentation](http://apache.github.io/incubator-systemml/) website on GitHub. We +recommend using the [Spark MLContext API](http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html) +to run SystemML from Scala or Python using `spark-shell`, `pyspark`, or `spark-submit`.