[GitHub] [spark] srowen commented on a change in pull request #30810: [WIP][ML] Add a vectorized BLAS implementation

GitBox Tue, 22 Dec 2020 06:44:27 -0800


srowen commented on a change in pull request #30810:
URL: https://github.com/apache/spark/pull/30810#discussion_r547313267




##########
File path: mllib-local/pom.xml
##########
@@ -68,6 +68,13 @@
       <scope>test</scope>
     </dependency>
 
+    <dependency>
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-core_${scala.binary.version}</artifactId>
+      <version>${project.version}</version>
+      <type>test-jar</type>

Review comment:
       This should be fine if it's test only.

##########
File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
##########
@@ -18,28 +18,51 @@
 package org.apache.spark.ml.linalg
 
 import com.github.fommil.netlib.{BLAS => NetlibBLAS, F2jBLAS}
-import com.github.fommil.netlib.BLAS.{getInstance => NativeBLAS}
 
 /**
  * BLAS routines for MLlib's vectors and matrices.
  */
 private[spark] object BLAS extends Serializable {
 
-  @transient private var _f2jBLAS: NetlibBLAS = _
+  @transient private var _javaBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
   private val nativeL1Threshold: Int = 256
 
-  // For level-1 function dspmv, use f2jBLAS for better performance.
-  private[ml] def f2jBLAS: NetlibBLAS = {
-    if (_f2jBLAS == null) {
-      _f2jBLAS = new F2jBLAS
+  // For level-1 function dspmv, use javaBLAS for better performance.
+  private[ml] def javaBLAS: NetlibBLAS = {
+    if (_javaBLAS == null) {
+      _javaBLAS =
+        try {
+          // scalastyle:off classforname
+          Class.forName("org.apache.spark.ml.linalg.VectorizedBLAS", true,
+                          Option(Thread.currentThread().getContextClassLoader)

Review comment:
       Nit: the indentation through line 41 seems too far in / uneven

##########
File path: .github/workflows/build_and_test.yml
##########
@@ -394,6 +394,31 @@ jobs:
         ./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes 
-Phive -Phive-thriftserver -Phadoop-cloud -Djava.version=11 install
         rm -rf ~/.m2/repository/org/apache/spark
 
+  java-17:

Review comment:
       @dongjoon-hyun does this look correct?
   I think we're not quite on to even testing vs Java 14 yet, though informally 
it seems to work fine already.
   We may not need to set this up yet, as I imagine anyone wanting to enable 
this is building from scratch anyway (we won't publish releases requiring 17 
for a long time!)

##########
File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
##########
@@ -267,7 +282,7 @@ private[spark] object BLAS extends Serializable {
       x: DenseVector,
       beta: Double,
       y: DenseVector): Unit = {
-    f2jBLAS.dspmv("U", n, alpha, A.values, x.values, 1, beta, y.values, 1)
+    javaBLAS.dspmv("U", n, alpha, A.values, x.values, 1, beta, y.values, 1)

Review comment:
       It looks like the intent here was to never try a vectorized 
implementation. That might or might not be wrong, but same point - I think we 
want to keep the separate native vs non-native options above, keep them separate

##########
File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
##########
@@ -18,28 +18,51 @@
 package org.apache.spark.ml.linalg
 
 import com.github.fommil.netlib.{BLAS => NetlibBLAS, F2jBLAS}
-import com.github.fommil.netlib.BLAS.{getInstance => NativeBLAS}
 
 /**
  * BLAS routines for MLlib's vectors and matrices.
  */
 private[spark] object BLAS extends Serializable {
 
-  @transient private var _f2jBLAS: NetlibBLAS = _
+  @transient private var _javaBLAS: NetlibBLAS = _
   @transient private var _nativeBLAS: NetlibBLAS = _
   private val nativeL1Threshold: Int = 256
 
-  // For level-1 function dspmv, use f2jBLAS for better performance.
-  private[ml] def f2jBLAS: NetlibBLAS = {
-    if (_f2jBLAS == null) {
-      _f2jBLAS = new F2jBLAS
+  // For level-1 function dspmv, use javaBLAS for better performance.
+  private[ml] def javaBLAS: NetlibBLAS = {
+    if (_javaBLAS == null) {
+      _javaBLAS =
+        try {
+          // scalastyle:off classforname
+          Class.forName("org.apache.spark.ml.linalg.VectorizedBLAS", true,
+                          Option(Thread.currentThread().getContextClassLoader)
+                            .getOrElse(getClass.getClassLoader))
+               .newInstance()
+               .asInstanceOf[NetlibBLAS]
+          // scalastyle:on classforname
+        } catch {
+          case _: Throwable => new F2jBLAS
+        }
+    }
+    _javaBLAS
+  }
+
+  // For level-3 routines, we use the native BLAS.
+  private[ml] def nativeBLAS: NetlibBLAS = {
+    if (_nativeBLAS == null) {
+      _nativeBLAS =
+        if (NetlibBLAS.getInstance.isInstanceOf[F2jBLAS]) {
+          javaBLAS
+        } else {
+          NetlibBLAS.getInstance
+        }
     }
-    _f2jBLAS
+    _nativeBLAS
   }
 
   private[ml] def getBLAS(vectorSize: Int): NetlibBLAS = {
     if (vectorSize < nativeL1Threshold) {
-      f2jBLAS
+      javaBLAS

Review comment:
       I think we might need to readjust the logic around here. This is meant 
to avoid calling into native code for very small operations, where it probably 
isn't a win. I think f2jBLAS should stay, and the nativeBLAS implementation 
should vary based on availability of the JVM or Netlib implementation?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #30810: [WIP][ML] Add a vectorized BLAS implementation

Reply via email to