luhenry commented on a change in pull request #32253:
URL: https://github.com/apache/spark/pull/32253#discussion_r619297422
##########
File path: mllib-local/pom.xml
##########
@@ -75,48 +75,12 @@
<type>test-jar</type>
<scope>test</scope>
</dependency>
+
+ <dependency>
+ <groupId>dev.ludovic.netlib</groupId>
+ <artifactId>blas</artifactId>
+ </dependency>
</dependencies>
- <profiles>
- <profile>
- <id>netlib-lgpl</id>
Review comment:
> The typical policy is that it's OK to release software that can merely
make use of such libraries at runtime (without actually distributing them
directly) as long as it doesn't substantially depend on their presence. I
believe that dynamic linking in the way you describe is OK - just like having
an SPI in JVM code that may be provided by some other GPL code at the user's
runtime.
What you describe is exactly how `dev.ludovic.netlib` works. It doesn't
substantially depend on OpenBLAS, MKL, or any other native BLAS library to be
there as it will fall back to a pure Java implementation otherwise. The
transition will be transparent, the feature will be equivalent, only the
performance will be affected.
> My main goal is to preserve current behavior.
I fully agree with that, as we don't want to break current behavior nor
bring in additional and unwanted dependencies.
> Right now if someone has, say, MKL on their native lib path for the JVM,
and built with this alternate profile, it'd be accelerated. If you're saying
that still works, but would not require this separate build profile because of
the different loading strategy, that's an improvement.
That's exactly how it work with `dev.ludovic.netlib` in JDK16+ today with
the implementation based on the Foreign Linker API, and that's how I will want
it to work with the JNI-based implementation for JDK8 and JDK11 in the future.
> Have you by chance tried this integration when OpenBLAS is present to
verify it makes use of it?
Yes, and it's a lot faster than F2J. The results in
https://github.com/apache/spark/pull/32253#issue-619173915 for `native` are
with the implementation based on the Foreign Linker API. You can see for
`dgemm`, `f2j` is **18-30x slower** than `native` (aka OpenBLAS). I also needed
to set `LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu` (on Ubuntu 20.04) for
`libblas.so` to be on `ld` path.
To make use of MKL, I only need to set
`LD_LIBRARY_PATH=/opt/intel/oneapi/mkl/latest/lib/intel64:/opt/intel/oneapi/compiler/latest/linux/compiler/lib/intel64_lin`
and pass `-Ddev.ludovic.netlib.blas.nativeLib=mkl_rt`.
I haven't tried with other BLAS implementations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]