Repository: spark
Updated Branches:
  refs/heads/master 8bb242902 -> fba722e31


[SPARK-25539][BUILD] Upgrade lz4-java to 1.5.0 get speed improvement

## What changes were proposed in this pull request?

This PR upgrade `lz4-java` to 1.5.0 get speed improvement.

**General speed improvements**

LZ4 decompression speed has always been a strong point. In v1.8.2, this gets 
even better, as it improves decompression speed by about 10%, thanks in a large 
part to suggestion from svpv .

For example, on a Mac OS-X laptop with an Intel Core i7-5557U CPU  3.10GHz,
running lz4 -bsilesia.tar compiled with default compiler llvm v9.1.0:

Version | v1.8.1 | v1.8.2 | Improvement
-- | -- | -- | --
Decompression speed | 2490 MB/s | 2770 MB/s | +11%

Compression speeds also receive a welcomed boost, though improvement is not 
evenly distributed, with higher levels benefiting quite a lot more.

Version | v1.8.1 | v1.8.2 | Improvement
-- | -- | -- | --
lz4 -1 | 504 MB/s | 516 MB/s | +2%
lz4 -9 | 23.2 MB/s | 25.6 MB/s | +10%
lz4 -12 | 3.5 Mb/s | 9.5 MB/s | +170%

More details:
https://github.com/lz4/lz4/releases/tag/v1.8.3

**Below is my benchmark result**
set `spark.sql.parquet.compression.codec` to `lz4` and disable orc benchmark, 
then run `FilterPushdownBenchmark`.
lz4-java 1.5.0:
```
[success] Total time: 5585 s, completed Sep 26, 2018 5:22:16 PM
```
lz4-java 1.4.0:
```
[success] Total time: 5591 s, completed Sep 26, 2018 5:22:24 PM
```
Some benchmark result:
```
lz4-java 1.5.0 Select 1 row with 500 filters:           Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized                            1953 / 1980          0.0  
1952502908.0       1.0X
Parquet Vectorized (Pushdown)                 2541 / 2585          0.0  
2541019869.0       0.8X

lz4-java 1.4.0 Select 1 row with 500 filters:           Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized                            1979 / 2103          0.0  
1979328144.0       1.0X
Parquet Vectorized (Pushdown)                 2596 / 2909          0.0  
2596222118.0       0.8X
```
Complete benchmark result:
https://issues.apache.org/jira/secure/attachment/12941360/FilterPushdownBenchmark-lz4-java-140-results.txt
https://issues.apache.org/jira/secure/attachment/12941361/FilterPushdownBenchmark-lz4-java-150-results.txt

## How was this patch tested?

manual tests

Closes #22551 from wangyum/SPARK-25539.

Authored-by: Yuming Wang <yumw...@ebay.com>
Signed-off-by: Sean Owen <sean.o...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fba722e3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fba722e3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fba722e3

Branch: refs/heads/master
Commit: fba722e319e356113a69c54f59e23150017634ae
Parents: 8bb2429
Author: Yuming Wang <yumw...@ebay.com>
Authored: Sun Oct 7 09:51:33 2018 -0500
Committer: Sean Owen <sean.o...@databricks.com>
Committed: Sun Oct 7 09:51:33 2018 -0500

----------------------------------------------------------------------
 dev/deps/spark-deps-hadoop-2.6 | 2 +-
 dev/deps/spark-deps-hadoop-2.7 | 2 +-
 dev/deps/spark-deps-hadoop-3.1 | 2 +-
 pom.xml                        | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/fba722e3/dev/deps/spark-deps-hadoop-2.6
----------------------------------------------------------------------
diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6
index 22e86ef..e0e3e0a 100644
--- a/dev/deps/spark-deps-hadoop-2.6
+++ b/dev/deps/spark-deps-hadoop-2.6
@@ -138,7 +138,7 @@ libfb303-0.9.3.jar
 libthrift-0.9.3.jar
 log4j-1.2.17.jar
 logging-interceptor-3.8.1.jar
-lz4-java-1.4.0.jar
+lz4-java-1.5.0.jar
 machinist_2.11-0.6.1.jar
 macro-compat_2.11-1.1.1.jar
 mesos-1.4.0-shaded-protobuf.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/fba722e3/dev/deps/spark-deps-hadoop-2.7
----------------------------------------------------------------------
diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index 19dd786..3b17f88 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -139,7 +139,7 @@ libfb303-0.9.3.jar
 libthrift-0.9.3.jar
 log4j-1.2.17.jar
 logging-interceptor-3.8.1.jar
-lz4-java-1.4.0.jar
+lz4-java-1.5.0.jar
 machinist_2.11-0.6.1.jar
 macro-compat_2.11-1.1.1.jar
 mesos-1.4.0-shaded-protobuf.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/fba722e3/dev/deps/spark-deps-hadoop-3.1
----------------------------------------------------------------------
diff --git a/dev/deps/spark-deps-hadoop-3.1 b/dev/deps/spark-deps-hadoop-3.1
index ea0f487..c818b2c 100644
--- a/dev/deps/spark-deps-hadoop-3.1
+++ b/dev/deps/spark-deps-hadoop-3.1
@@ -154,7 +154,7 @@ libfb303-0.9.3.jar
 libthrift-0.9.3.jar
 log4j-1.2.17.jar
 logging-interceptor-3.8.1.jar
-lz4-java-1.4.0.jar
+lz4-java-1.5.0.jar
 machinist_2.11-0.6.1.jar
 macro-compat_2.11-1.1.1.jar
 mesos-1.4.0-shaded-protobuf.jar

http://git-wip-us.apache.org/repos/asf/spark/blob/fba722e3/pom.xml
----------------------------------------------------------------------
diff --git a/pom.xml b/pom.xml
index 79af5d6..98da38f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -540,7 +540,7 @@
       <dependency>
         <groupId>org.lz4</groupId>
         <artifactId>lz4-java</artifactId>
-        <version>1.4.0</version>
+        <version>1.5.0</version>
       </dependency>
       <dependency>
         <groupId>com.github.luben</groupId>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to