Dennis Aumiller created SPARK-24674:
---------------------------------------

             Summary: Spark on Kubernetes BLAS performance
                 Key: SPARK-24674
                 URL: https://issues.apache.org/jira/browse/SPARK-24674
             Project: Spark
          Issue Type: Question
          Components: Build, Kubernetes, MLlib
    Affects Versions: 2.3.1
         Environment: Spark 2.3.1 SNAPSHOT (as of June 25th)
Kubernetes version 1.7.5
Kubernetes cluster, consisting of 4 Nodes with 16 GB RAM, 8 core Intel 
processors.
            Reporter: Dennis Aumiller


 

Usually native BLAS libraries speed up the execution time of CPU-heavy 
operations as for example in MLlib quite significantly.
 Of course, the initial error
{code:java}
WARN  BLAS:61 - Failed to load implementation from: 
com.github.fommil.netlib.NativeSystemBLAS
{code}
can be resolved not so easily, since, as reported 
[here|[https://github.com/apache/spark/pull/19717/files/7d2b30373b2e4d8d5311e10c3f9a62a2d900d568],]
 this seems to be the issue because of the underlying image used by the Spark 
Dockerfile.
 Re-building spark with
{code:java}
-Pnetlib-lgpl
{code}
also does not solve the problem, but I managed to build BLAS and LAPACK into 
Alpine, with a lot of tricks involved.

Interestingly, I noticed that the performance of PCA in my case dropped quite 
significantly (with BLAS support, compared to the netlib-java fallback). I am 
aware of [#SPARK-21305] as well, but that did not help my case, either.
 Furthermore, calling SVD on a matrix of only size 5000x5000 (density 1%) 
already throws an error when trying to use native ARPACK, but runs perfectly 
fine with the fallback version.

The question would be whether there has been some investigation in that 
direction already.
 Or, if not, whether it would be interesting for the Spark community to provide 
a
 * more detailed report with respect to timings/configurations/test setup
 * a provided Dockerfile to build Spark with BLAS/LAPACK/ARPACK using the 
shipped Dockerfile as a basis
  

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to