huaxingao commented on a change in pull request #32473:
URL: https://github.com/apache/spark/pull/32473#discussion_r649698982
##########
File path: sql/core/benchmarks/BloomFilterBenchmark-jdk11-results.txt
##########
@@ -2,23 +2,179 @@
ORC Write
================================================================================================
-OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Write 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Without bloom filter 19503 19621
166 5.1 195.0 1.0X
-With bloom filter 22472 22710
335 4.4 224.7 0.9X
+Without bloom filter 13568 13645
109 7.4 135.7 1.0X
+With bloom filter 16116 16238
172 6.2 161.2 0.8X
================================================================================================
ORC Read
================================================================================================
-OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Without bloom filter 1981 2040
82 50.5 19.8 1.0X
-With bloom filter 1428 1467
54 70.0 14.3 1.4X
+Without bloom filter 1572 1605
47 63.6 15.7 1.0X
+With bloom filter 1343 1359
23 74.5 13.4 1.2X
+
+
+================================================================================================
+ORC Read for IN set
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 1M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+Without bloom filter 51 63
15 19.6 51.1 1.0X
+With bloom filter 54 88
23 18.5 54.0 0.9X
+
+
+================================================================================================
+Parquet Write
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Write 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+Without bloom filter 13679 13954
389 7.3 136.8 1.0X
+With bloom filter 18260 18284
33 5.5 182.6 0.7X
+
+
+================================================================================================
+Parquet Read
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+Without bloom filter, blocksize: 2097152 954 984
49 104.8 9.5 1.0X
+With bloom filter, blocksize: 2097152 285 307
21 350.4 2.9 3.3X
+
+
+================================================================================================
+Parquet Read
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+Without bloom filter, blocksize: 3145728 788 831
40 126.9 7.9 1.0X
+With bloom filter, blocksize: 3145728 192 262
47 521.4 1.9 4.1X
+
+
+================================================================================================
+Parquet Read
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+Without bloom filter, blocksize: 4194304 787 847
75 127.0 7.9 1.0X
+With bloom filter, blocksize: 4194304 201 224
18 496.4 2.0 3.9X
+
+
+================================================================================================
+Parquet Read
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+Without bloom filter, blocksize: 5242880 854 872
18 117.1 8.5 1.0X
+With bloom filter, blocksize: 5242880 172 222
37 582.7 1.7 5.0X
+
+
+================================================================================================
+Parquet Read
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+Without bloom filter, blocksize: 6291456 785 813
27 127.4 7.9 1.0X
+With bloom filter, blocksize: 6291456 167 188
14 598.0 1.7 4.7X
+
+
+================================================================================================
+Parquet Read
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+Without bloom filter, blocksize: 8388608 806 834
42 124.1 8.1 1.0X
+With bloom filter, blocksize: 8388608 360 383
29 277.8 3.6 2.2X
+
+
+================================================================================================
+Parquet Read
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------
+Without bloom filter, blocksize: 16777216 812 846
42 123.2 8.1 1.0X
+With bloom filter, blocksize: 16777216 780 807
27 128.2 7.8 1.0X
+
+
+================================================================================================
+Parquet Read
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------
+Without bloom filter, blocksize: 33554432 852 862
10 117.4 8.5 1.0X
+With bloom filter, blocksize: 33554432 820 865
59 121.9 8.2 1.0X
+
+
+================================================================================================
+Parquet Read
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------
+Without bloom filter, blocksize: 67108864 844 911
58 118.5 8.4 1.0X
+With bloom filter, blocksize: 67108864 851 853
2 117.5 8.5 1.0X
+
+
+================================================================================================
+Parquet Read
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 100M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+Without bloom filter, blocksize: 134217728 839 887
53 119.3 8.4 1.0X
+With bloom filter, blocksize: 134217728 872 881
9 114.6 8.7 1.0X
+
+
+================================================================================================
+Parquet Read for IN set
+================================================================================================
+
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1047-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Read a row from 1M rows: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+Without bloom filter 70 76
6 14.2 70.2 1.0X
+With bloom filter 73 103
22 13.8 72.6 1.0X
Review comment:
Not due to IN predicate problem because ORC also seems a bit slower with
bloom filter. I think the data is too small. Let me increase the data size and
try again.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]