Dennis Aumiller created SPARK-24674: ---------------------------------------
Summary: Spark on Kubernetes BLAS performance Key: SPARK-24674 URL: https://issues.apache.org/jira/browse/SPARK-24674 Project: Spark Issue Type: Question Components: Build, Kubernetes, MLlib Affects Versions: 2.3.1 Environment: Spark 2.3.1 SNAPSHOT (as of June 25th) Kubernetes version 1.7.5 Kubernetes cluster, consisting of 4 Nodes with 16 GB RAM, 8 core Intel processors. Reporter: Dennis Aumiller Usually native BLAS libraries speed up the execution time of CPU-heavy operations as for example in MLlib quite significantly. Of course, the initial error {code:java} WARN BLAS:61 - Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS {code} can be resolved not so easily, since, as reported [here|[https://github.com/apache/spark/pull/19717/files/7d2b30373b2e4d8d5311e10c3f9a62a2d900d568],] this seems to be the issue because of the underlying image used by the Spark Dockerfile. Re-building spark with {code:java} -Pnetlib-lgpl {code} also does not solve the problem, but I managed to build BLAS and LAPACK into Alpine, with a lot of tricks involved. Interestingly, I noticed that the performance of PCA in my case dropped quite significantly (with BLAS support, compared to the netlib-java fallback). I am aware of [#SPARK-21305] as well, but that did not help my case, either. Furthermore, calling SVD on a matrix of only size 5000x5000 (density 1%) already throws an error when trying to use native ARPACK, but runs perfectly fine with the fallback version. The question would be whether there has been some investigation in that direction already. Or, if not, whether it would be interesting for the Spark community to provide a * more detailed report with respect to timings/configurations/test setup * a provided Dockerfile to build Spark with BLAS/LAPACK/ARPACK using the shipped Dockerfile as a basis -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org