srowen commented on a change in pull request #29139:
URL: https://github.com/apache/spark/pull/29139#discussion_r456537560



##########
File path: docs/ml-guide.md
##########
@@ -62,23 +62,13 @@ The primary Machine Learning API for Spark is now the 
[DataFrame](sql-programmin
 
 # Dependencies
 
-MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), 
which depends on
-[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing.
-If native libraries[^1] are not available at runtime, you will see a warning 
message and a pure JVM
-implementation will be used instead.
+MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/) and 
[netlib-java](https://github.com/fommil/netlib-java) for optimised numerical 
processing[^1]. Those packages may call native acceleration libraries such as 
[Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 or [OpenBLAS](http://www.openblas.net) if they are available as system 
libraries or in runtime library paths. 
 
-Due to licensing issues with runtime proprietary binaries, we do not include 
`netlib-java`'s native
-proxies by default.
-To configure `netlib-java` / Breeze to use system optimised binaries, include
-`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`) as 
a dependency of your
-project and read the [netlib-java](https://github.com/fommil/netlib-java) 
documentation for your
-platform's additional installation instructions.
-
-The most popular native BLAS such as [Intel 
MKL](https://software.intel.com/en-us/mkl), 
[OpenBLAS](http://www.openblas.net), can use multiple threads in a single 
operation, which can conflict with Spark's execution model.
-
-Configuring these BLAS implementations to use a single thread for operations 
may actually improve performance (see 
[SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is 
usually optimal to match this to the number of cores each Spark task is 
configured to use, which is 1 by default and typically left at 1.
-
-Please refer to resources like the following to understand how to configure 
the number of threads these BLAS implementations use: [Intel 
MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications)
 or [Intel 
oneMKL](https://software.intel.com/en-us/onemkl-linux-developer-guide-improving-performance-with-threading)
 and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). 
Note that if nativeBLAS is not properly configured in system, java 
implementation(f2jBLAS) will be used as fallback option.
+Due to licensing issues with runtime proprietary binaries, we do not include 
`netlib-java`'s native proxies by default. See [MLlib Linear Algebra 
Acceleration Guide](ml-linalg-guide.md) for how to enable accelerated linear 
algebra processing. If accelerated native libraries are not enabled, you will 
see a warning message below and a pure JVM implementation will be used instead:

Review comment:
       They're not proprietary, they're just LGPL. I wouldn't call these 
license restricitions or anything; they're OSS. Just say they can't be 
distributed with Spark due to differing OSS licenses.

##########
File path: docs/ml-linalg-guide.md
##########
@@ -0,0 +1,85 @@
+# Spark MLlib Linear Algebra Acceleration Guide
+
+## Introduction
+
+This guide provides necessary information to enable accelerated linear algebra 
processing for Spark MLlib.
+
+Spark MLlib defines Vector and Matrix as basic data types for machine learning 
algorithms. On top of them, 
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and 
[LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and 
supported by [netlib-java](https://github.com/fommil/netlib-Java).[^1] 
`netlib-java` can use optimized native linear algebra libraries (refered to as 
"native libraries" or "BLAS libraries" hereafter) for faster numerical 
processing. [Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 and [OpenBLAS](http://www.openblas.net) are two most popular ones.
+
+However due to license restrictions, the official released Spark binaries by 
default doesn't contain native libraries support for `netlib-java`.
+
+The following sections describe how to enable `netlib-java` with native 
libraries support for Spark MLlib and how to install native libraries and 
configure them properly.
+
+[^1]: The algorithms may call Breeze and it will in turn call `netlib-java`.
+
+## Enable `netlib-java` with native library proxies 
+
+`netlib-java` native libraries has a dependency on `libgfortran`. It requires 
GFORTRAN 1.4 or above. This can be obtained by installing `libgfortran` 
package. After installation, the following command can be used to verify if it 
is installed properly.
+```
+strings /path/to/libgfortran.so.3.0.0 | grep GFORTRAN_1.4
+```
+
+To build Spark with `netlib-java` native library proxies, you need to add 
`-Pnetlib-lgpl` to Maven build command line. For example:
+```
+$SPARK_SOURCE_HOME/build/mvn -Pnetlib-lgpl -DskipTests -Pyarn -Phadoop-2.7 
clean package
+```
+
+If you only want to enable it in your project, include 
`com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
+
+## Install Native Linear Algebra Libraries
+
+Intel MKL and OpenBLAS are two most popular native linear algebra libraries, 
you can choose one of them based on your preference. We described basic 
instructions as below. You can refer to [netlib-java 
documentation](https://github.com/fommil/netlib-java) for more advanced 
installation instructions.

Review comment:
       ... libraries. You ...
   We describe ... -> We provide

##########
File path: docs/ml-linalg-guide.md
##########
@@ -0,0 +1,85 @@
+# Spark MLlib Linear Algebra Acceleration Guide
+
+## Introduction
+
+This guide provides necessary information to enable accelerated linear algebra 
processing for Spark MLlib.
+
+Spark MLlib defines Vector and Matrix as basic data types for machine learning 
algorithms. On top of them, 
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and 
[LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and 
supported by [netlib-java](https://github.com/fommil/netlib-Java).[^1] 
`netlib-java` can use optimized native linear algebra libraries (refered to as 
"native libraries" or "BLAS libraries" hereafter) for faster numerical 
processing. [Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 and [OpenBLAS](http://www.openblas.net) are two most popular ones.
+
+However due to license restrictions, the official released Spark binaries by 
default doesn't contain native libraries support for `netlib-java`.
+
+The following sections describe how to enable `netlib-java` with native 
libraries support for Spark MLlib and how to install native libraries and 
configure them properly.
+
+[^1]: The algorithms may call Breeze and it will in turn call `netlib-java`.
+
+## Enable `netlib-java` with native library proxies 
+
+`netlib-java` native libraries has a dependency on `libgfortran`. It requires 
GFORTRAN 1.4 or above. This can be obtained by installing `libgfortran` 
package. After installation, the following command can be used to verify if it 
is installed properly.
+```
+strings /path/to/libgfortran.so.3.0.0 | grep GFORTRAN_1.4
+```
+
+To build Spark with `netlib-java` native library proxies, you need to add 
`-Pnetlib-lgpl` to Maven build command line. For example:
+```
+$SPARK_SOURCE_HOME/build/mvn -Pnetlib-lgpl -DskipTests -Pyarn -Phadoop-2.7 
clean package
+```
+
+If you only want to enable it in your project, include 
`com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
+
+## Install Native Linear Algebra Libraries
+
+Intel MKL and OpenBLAS are two most popular native linear algebra libraries, 
you can choose one of them based on your preference. We described basic 
instructions as below. You can refer to [netlib-java 
documentation](https://github.com/fommil/netlib-java) for more advanced 
installation instructions.
+
+### Intel MKL
+
+- Download and install Intel MKL. The installation should be done on all nodes 
of the cluster. We assume the installation location is $MKLROOT.
+- Make sure `/usr/local/lib` is in system library search path and run the 
following commands:
+```
+$ ln -sf $MKLROOT/lib/intel64/libmkl_rt.so /usr/local/lib/libblas.so.3
+$ ln -sf $MKLROOT/lib/intel64/libmkl_rt.so /usr/local/lib/liblapack.so.3
+```
+
+### OpenBLAS
+
+The installation should be done on all nodes of the cluster. Generic version 
of OpenBLAS are available with most distributions. You can install it with a 
distribution Package Manager (APT or YUM).
+
+For Debian / Ubuntu:
+```
+sudo apt-get install libopenblas-base
+sudo update-alternatives --config libblas.so.3
+```
+For CentOS / RHEL:
+```
+sudo yum install openblas
+```
+
+## Check if native libraries are enabled for MLlib
+
+To verify native libraries are properly loaded, start `spark-shell` and run 
the following code
+```
+scala> import com.github.fommil.netlib.BLAS;
+scala> System.out.println(BLAS.getInstance().getClass().getName());
+```
+
+If they are correctly loaded, it should print 
`com.github.fommil.netlib.NativeSystemBLAS`. Otherwise the warnings should be 
printed:
+```
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeSystemBLAS
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeRefBLAS
+```
+
+If native libraries are not properly configured in the system, Java BLAS 
implementation(f2jBLAS) will be used as fallback option.
+
+## Spark Configuration
+
+The use of multiple-threading in either Intel MKL or OpenBLAS can conflict 
with Spark's execution model.[^2]

Review comment:
       multithreading
   It doesn't conflict, just may not be optimal, as many tasks per machine may 
try to use all the cores in MKL by default

##########
File path: docs/ml-linalg-guide.md
##########
@@ -0,0 +1,85 @@
+# Spark MLlib Linear Algebra Acceleration Guide
+
+## Introduction
+
+This guide provides necessary information to enable accelerated linear algebra 
processing for Spark MLlib.
+
+Spark MLlib defines Vector and Matrix as basic data types for machine learning 
algorithms. On top of them, 
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and 
[LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and 
supported by [netlib-java](https://github.com/fommil/netlib-Java).[^1] 
`netlib-java` can use optimized native linear algebra libraries (refered to as 
"native libraries" or "BLAS libraries" hereafter) for faster numerical 
processing. [Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 and [OpenBLAS](http://www.openblas.net) are two most popular ones.
+
+However due to license restrictions, the official released Spark binaries by 
default doesn't contain native libraries support for `netlib-java`.
+
+The following sections describe how to enable `netlib-java` with native 
libraries support for Spark MLlib and how to install native libraries and 
configure them properly.
+
+[^1]: The algorithms may call Breeze and it will in turn call `netlib-java`.
+
+## Enable `netlib-java` with native library proxies 
+
+`netlib-java` native libraries has a dependency on `libgfortran`. It requires 
GFORTRAN 1.4 or above. This can be obtained by installing `libgfortran` 
package. After installation, the following command can be used to verify if it 
is installed properly.
+```
+strings /path/to/libgfortran.so.3.0.0 | grep GFORTRAN_1.4
+```
+
+To build Spark with `netlib-java` native library proxies, you need to add 
`-Pnetlib-lgpl` to Maven build command line. For example:
+```
+$SPARK_SOURCE_HOME/build/mvn -Pnetlib-lgpl -DskipTests -Pyarn -Phadoop-2.7 
clean package
+```
+
+If you only want to enable it in your project, include 
`com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
+
+## Install Native Linear Algebra Libraries
+
+Intel MKL and OpenBLAS are two most popular native linear algebra libraries, 
you can choose one of them based on your preference. We described basic 
instructions as below. You can refer to [netlib-java 
documentation](https://github.com/fommil/netlib-java) for more advanced 
installation instructions.
+
+### Intel MKL
+
+- Download and install Intel MKL. The installation should be done on all nodes 
of the cluster. We assume the installation location is $MKLROOT.
+- Make sure `/usr/local/lib` is in system library search path and run the 
following commands:
+```
+$ ln -sf $MKLROOT/lib/intel64/libmkl_rt.so /usr/local/lib/libblas.so.3
+$ ln -sf $MKLROOT/lib/intel64/libmkl_rt.so /usr/local/lib/liblapack.so.3
+```
+
+### OpenBLAS
+
+The installation should be done on all nodes of the cluster. Generic version 
of OpenBLAS are available with most distributions. You can install it with a 
distribution Package Manager (APT or YUM).
+
+For Debian / Ubuntu:
+```
+sudo apt-get install libopenblas-base
+sudo update-alternatives --config libblas.so.3
+```
+For CentOS / RHEL:
+```
+sudo yum install openblas
+```
+
+## Check if native libraries are enabled for MLlib
+
+To verify native libraries are properly loaded, start `spark-shell` and run 
the following code
+```
+scala> import com.github.fommil.netlib.BLAS;
+scala> System.out.println(BLAS.getInstance().getClass().getName());
+```
+
+If they are correctly loaded, it should print 
`com.github.fommil.netlib.NativeSystemBLAS`. Otherwise the warnings should be 
printed:
+```
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeSystemBLAS
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeRefBLAS
+```
+
+If native libraries are not properly configured in the system, Java BLAS 
implementation(f2jBLAS) will be used as fallback option.

Review comment:
       space after implementation




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to