srowen commented on a change in pull request #29139:
URL: https://github.com/apache/spark/pull/29139#discussion_r457515567



##########
File path: docs/ml-linalg-guide.md
##########
@@ -0,0 +1,107 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+     http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# MLlib Linear Algebra Acceleration Guide
+
+## Introduction
+
+This guide provides necessary information to enable accelerated linear algebra 
processing for Spark MLlib.
+
+Spark MLlib defines Vector and Matrix as basic data types for machine learning 
algorithms. On top of them, 
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and 
[LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and 
supported by [netlib-java](https://github.com/fommil/netlib-Java) [^1]. 
`netlib-java` can use optimized native linear algebra libraries (refered to as 
"native libraries" or "BLAS libraries" hereafter) for faster numerical 
processing. [Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 and [OpenBLAS](http://www.openblas.net) are two most popular ones.
+
+However due to license difference, the official released Spark binaries by 
default doesn't contain native libraries support for `netlib-java`.
+
+The following sections describe how to enable `netlib-java` with native 
libraries support for Spark MLlib and how to install native libraries and 
configure them properly.
+
+[^1]: The algorithms may call Breeze and it will in turn call `netlib-java`.
+
+## Enable `netlib-java` with native library proxies 
+
+`netlib-java` depends on `libgfortran`. It requires GFORTRAN 1.4 or above. 
This can be obtained by installing `libgfortran` package. After installation, 
the following command can be used to verify if it is installed properly.
+```
+strings /path/to/libgfortran.so.3.0.0 | grep GFORTRAN_1.4
+```
+
+To build Spark with `netlib-java` native library proxies, you need to add 
`-Pnetlib-lgpl` to Maven build command line. For example:
+```
+$SPARK_SOURCE_HOME/build/mvn -Pnetlib-lgpl -DskipTests -Pyarn -Phadoop-2.7 
clean package
+```
+
+If you only want to enable it in your project, include 
`com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
+
+## Install native linear algebra libraries
+
+Intel MKL and OpenBLAS are two most popular native linear algebra libraries. 
You can choose one of them based on your preference. We provide basic 
instructions as below. You can refer to [netlib-java 
documentation](https://github.com/fommil/netlib-java) for more advanced 
installation instructions.
+
+### Intel MKL
+
+- Download and install Intel MKL. The installation should be done on all nodes 
of the cluster. We assume the installation location is $MKLROOT (Eg. 
/opt/intel/mkl).
+- Create soft links to `libmkl_rt.so` with specific names in system library 
search paths. For instance, make sure `/usr/local/lib` is in system library 
search paths and run the following commands:
+```
+$ ln -sf $MKLROOT/lib/intel64/libmkl_rt.so /usr/local/lib/libblas.so.3
+$ ln -sf $MKLROOT/lib/intel64/libmkl_rt.so /usr/local/lib/liblapack.so.3
+```
+
+### OpenBLAS
+
+The installation should be done on all nodes of the cluster. Generic version 
of OpenBLAS are available with most distributions. You can install it with a 
distribution Package Manager (APT or YUM).
+
+For Debian / Ubuntu:
+```
+sudo apt-get install libopenblas-base
+sudo update-alternatives --config libblas.so.3
+```
+For CentOS / RHEL:
+```
+sudo yum install openblas
+```
+
+## Check if native libraries are enabled for MLlib
+
+To verify native libraries are properly loaded, start `spark-shell` and run 
the following code
+```
+scala> import com.github.fommil.netlib.BLAS;
+scala> System.out.println(BLAS.getInstance().getClass().getName());
+```
+
+If they are correctly loaded, it should print 
`com.github.fommil.netlib.NativeSystemBLAS`. Otherwise the warnings should be 
printed:
+```
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeSystemBLAS
+WARN BLAS: Failed to load implementation 
from:com.github.fommil.netlib.NativeRefBLAS
+```
+
+If native libraries are not properly configured in the system, Java BLAS 
implementation (f2jBLAS) will be used as fallback option.

Review comment:
       the Java implementation

##########
File path: docs/ml-linalg-guide.md
##########
@@ -0,0 +1,107 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+     http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# MLlib Linear Algebra Acceleration Guide
+
+## Introduction
+
+This guide provides necessary information to enable accelerated linear algebra 
processing for Spark MLlib.
+
+Spark MLlib defines Vector and Matrix as basic data types for machine learning 
algorithms. On top of them, 
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and 
[LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and 
supported by [netlib-java](https://github.com/fommil/netlib-Java) [^1]. 
`netlib-java` can use optimized native linear algebra libraries (refered to as 
"native libraries" or "BLAS libraries" hereafter) for faster numerical 
processing. [Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 and [OpenBLAS](http://www.openblas.net) are two most popular ones.
+
+However due to license difference, the official released Spark binaries by 
default doesn't contain native libraries support for `netlib-java`.
+
+The following sections describe how to enable `netlib-java` with native 
libraries support for Spark MLlib and how to install native libraries and 
configure them properly.
+
+[^1]: The algorithms may call Breeze and it will in turn call `netlib-java`.

Review comment:
       Is this a footnote? I think you can just inline this comment?

##########
File path: docs/ml-linalg-guide.md
##########
@@ -0,0 +1,107 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+     http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# MLlib Linear Algebra Acceleration Guide
+
+## Introduction
+
+This guide provides necessary information to enable accelerated linear algebra 
processing for Spark MLlib.
+
+Spark MLlib defines Vector and Matrix as basic data types for machine learning 
algorithms. On top of them, 
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and 
[LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and 
supported by [netlib-java](https://github.com/fommil/netlib-Java) [^1]. 
`netlib-java` can use optimized native linear algebra libraries (refered to as 
"native libraries" or "BLAS libraries" hereafter) for faster numerical 
processing. [Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 and [OpenBLAS](http://www.openblas.net) are two most popular ones.
+
+However due to license difference, the official released Spark binaries by 
default doesn't contain native libraries support for `netlib-java`.
+
+The following sections describe how to enable `netlib-java` with native 
libraries support for Spark MLlib and how to install native libraries and 
configure them properly.
+
+[^1]: The algorithms may call Breeze and it will in turn call `netlib-java`.
+
+## Enable `netlib-java` with native library proxies 
+
+`netlib-java` depends on `libgfortran`. It requires GFORTRAN 1.4 or above. 
This can be obtained by installing `libgfortran` package. After installation, 
the following command can be used to verify if it is installed properly.
+```
+strings /path/to/libgfortran.so.3.0.0 | grep GFORTRAN_1.4
+```
+
+To build Spark with `netlib-java` native library proxies, you need to add 
`-Pnetlib-lgpl` to Maven build command line. For example:
+```
+$SPARK_SOURCE_HOME/build/mvn -Pnetlib-lgpl -DskipTests -Pyarn -Phadoop-2.7 
clean package
+```
+
+If you only want to enable it in your project, include 
`com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
+
+## Install native linear algebra libraries
+
+Intel MKL and OpenBLAS are two most popular native linear algebra libraries. 
You can choose one of them based on your preference. We provide basic 
instructions as below. You can refer to [netlib-java 
documentation](https://github.com/fommil/netlib-java) for more advanced 
installation instructions.
+
+### Intel MKL
+
+- Download and install Intel MKL. The installation should be done on all nodes 
of the cluster. We assume the installation location is $MKLROOT (Eg. 
/opt/intel/mkl).

Review comment:
       (e.g. `/opt/intel/mkl`)

##########
File path: docs/ml-linalg-guide.md
##########
@@ -0,0 +1,107 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+     http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# MLlib Linear Algebra Acceleration Guide
+
+## Introduction
+
+This guide provides necessary information to enable accelerated linear algebra 
processing for Spark MLlib.
+
+Spark MLlib defines Vector and Matrix as basic data types for machine learning 
algorithms. On top of them, 
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and 
[LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and 
supported by [netlib-java](https://github.com/fommil/netlib-Java) [^1]. 
`netlib-java` can use optimized native linear algebra libraries (refered to as 
"native libraries" or "BLAS libraries" hereafter) for faster numerical 
processing. [Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 and [OpenBLAS](http://www.openblas.net) are two most popular ones.
+
+However due to license difference, the official released Spark binaries by 
default doesn't contain native libraries support for `netlib-java`.
+
+The following sections describe how to enable `netlib-java` with native 
libraries support for Spark MLlib and how to install native libraries and 
configure them properly.
+
+[^1]: The algorithms may call Breeze and it will in turn call `netlib-java`.
+
+## Enable `netlib-java` with native library proxies 
+
+`netlib-java` depends on `libgfortran`. It requires GFORTRAN 1.4 or above. 
This can be obtained by installing `libgfortran` package. After installation, 
the following command can be used to verify if it is installed properly.
+```
+strings /path/to/libgfortran.so.3.0.0 | grep GFORTRAN_1.4
+```
+
+To build Spark with `netlib-java` native library proxies, you need to add 
`-Pnetlib-lgpl` to Maven build command line. For example:
+```
+$SPARK_SOURCE_HOME/build/mvn -Pnetlib-lgpl -DskipTests -Pyarn -Phadoop-2.7 
clean package
+```
+
+If you only want to enable it in your project, include 
`com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
+
+## Install native linear algebra libraries
+
+Intel MKL and OpenBLAS are two most popular native linear algebra libraries. 
You can choose one of them based on your preference. We provide basic 
instructions as below. You can refer to [netlib-java 
documentation](https://github.com/fommil/netlib-java) for more advanced 
installation instructions.

Review comment:
       or, 'the two most popular'

##########
File path: docs/ml-linalg-guide.md
##########
@@ -0,0 +1,107 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+     http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# MLlib Linear Algebra Acceleration Guide
+
+## Introduction
+
+This guide provides necessary information to enable accelerated linear algebra 
processing for Spark MLlib.
+
+Spark MLlib defines Vector and Matrix as basic data types for machine learning 
algorithms. On top of them, 
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and 
[LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and 
supported by [netlib-java](https://github.com/fommil/netlib-Java) [^1]. 
`netlib-java` can use optimized native linear algebra libraries (refered to as 
"native libraries" or "BLAS libraries" hereafter) for faster numerical 
processing. [Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 and [OpenBLAS](http://www.openblas.net) are two most popular ones.
+
+However due to license difference, the official released Spark binaries by 
default doesn't contain native libraries support for `netlib-java`.

Review comment:
       license differences,
   by default don't contain

##########
File path: docs/ml-linalg-guide.md
##########
@@ -0,0 +1,107 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+     http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# MLlib Linear Algebra Acceleration Guide
+
+## Introduction
+
+This guide provides necessary information to enable accelerated linear algebra 
processing for Spark MLlib.
+
+Spark MLlib defines Vector and Matrix as basic data types for machine learning 
algorithms. On top of them, 
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and 
[LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and 
supported by [netlib-java](https://github.com/fommil/netlib-Java) [^1]. 
`netlib-java` can use optimized native linear algebra libraries (refered to as 
"native libraries" or "BLAS libraries" hereafter) for faster numerical 
processing. [Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 and [OpenBLAS](http://www.openblas.net) are two most popular ones.
+
+However due to license difference, the official released Spark binaries by 
default doesn't contain native libraries support for `netlib-java`.
+
+The following sections describe how to enable `netlib-java` with native 
libraries support for Spark MLlib and how to install native libraries and 
configure them properly.
+
+[^1]: The algorithms may call Breeze and it will in turn call `netlib-java`.
+
+## Enable `netlib-java` with native library proxies 
+
+`netlib-java` depends on `libgfortran`. It requires GFORTRAN 1.4 or above. 
This can be obtained by installing `libgfortran` package. After installation, 
the following command can be used to verify if it is installed properly.
+```
+strings /path/to/libgfortran.so.3.0.0 | grep GFORTRAN_1.4
+```
+
+To build Spark with `netlib-java` native library proxies, you need to add 
`-Pnetlib-lgpl` to Maven build command line. For example:
+```
+$SPARK_SOURCE_HOME/build/mvn -Pnetlib-lgpl -DskipTests -Pyarn -Phadoop-2.7 
clean package
+```
+
+If you only want to enable it in your project, include 
`com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
+
+## Install native linear algebra libraries
+
+Intel MKL and OpenBLAS are two most popular native linear algebra libraries. 
You can choose one of them based on your preference. We provide basic 
instructions as below. You can refer to [netlib-java 
documentation](https://github.com/fommil/netlib-java) for more advanced 
installation instructions.

Review comment:
       two most popular -> two popular

##########
File path: docs/ml-linalg-guide.md
##########
@@ -0,0 +1,107 @@
+---
+layout: global
+title: MLlib Linear Algebra Acceleration Guide
+displayTitle: MLlib Linear Algebra Acceleration Guide
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+     http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# MLlib Linear Algebra Acceleration Guide
+
+## Introduction
+
+This guide provides necessary information to enable accelerated linear algebra 
processing for Spark MLlib.
+
+Spark MLlib defines Vector and Matrix as basic data types for machine learning 
algorithms. On top of them, 
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) and 
[LAPACK](https://en.wikipedia.org/wiki/LAPACK) operations are implemented and 
supported by [netlib-java](https://github.com/fommil/netlib-Java) [^1]. 
`netlib-java` can use optimized native linear algebra libraries (refered to as 
"native libraries" or "BLAS libraries" hereafter) for faster numerical 
processing. [Intel 
MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html)
 and [OpenBLAS](http://www.openblas.net) are two most popular ones.
+
+However due to license difference, the official released Spark binaries by 
default doesn't contain native libraries support for `netlib-java`.
+
+The following sections describe how to enable `netlib-java` with native 
libraries support for Spark MLlib and how to install native libraries and 
configure them properly.
+
+[^1]: The algorithms may call Breeze and it will in turn call `netlib-java`.
+
+## Enable `netlib-java` with native library proxies 
+
+`netlib-java` depends on `libgfortran`. It requires GFORTRAN 1.4 or above. 
This can be obtained by installing `libgfortran` package. After installation, 
the following command can be used to verify if it is installed properly.
+```
+strings /path/to/libgfortran.so.3.0.0 | grep GFORTRAN_1.4
+```
+
+To build Spark with `netlib-java` native library proxies, you need to add 
`-Pnetlib-lgpl` to Maven build command line. For example:
+```
+$SPARK_SOURCE_HOME/build/mvn -Pnetlib-lgpl -DskipTests -Pyarn -Phadoop-2.7 
clean package
+```
+
+If you only want to enable it in your project, include 
`com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
+
+## Install native linear algebra libraries
+
+Intel MKL and OpenBLAS are two most popular native linear algebra libraries. 
You can choose one of them based on your preference. We provide basic 
instructions as below. You can refer to [netlib-java 
documentation](https://github.com/fommil/netlib-java) for more advanced 
installation instructions.
+
+### Intel MKL
+
+- Download and install Intel MKL. The installation should be done on all nodes 
of the cluster. We assume the installation location is $MKLROOT (Eg. 
/opt/intel/mkl).
+- Create soft links to `libmkl_rt.so` with specific names in system library 
search paths. For instance, make sure `/usr/local/lib` is in system library 
search paths and run the following commands:
+```
+$ ln -sf $MKLROOT/lib/intel64/libmkl_rt.so /usr/local/lib/libblas.so.3
+$ ln -sf $MKLROOT/lib/intel64/libmkl_rt.so /usr/local/lib/liblapack.so.3
+```
+
+### OpenBLAS
+
+The installation should be done on all nodes of the cluster. Generic version 
of OpenBLAS are available with most distributions. You can install it with a 
distribution Package Manager (APT or YUM).

Review comment:
       The package managers are just `apt` and `yum`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to