Github user maropu commented on the issue:
https://github.com/apache/spark/pull/21288
I noticed why the big performance value changes happened in
https://github.com/apache/spark/pull/21288#discussion_r191280132; that's
because [the
commit](./https://github.com/apache/spark/pull/21288/commits/39e5a507fe22cade6bed0613eefbccab15cf45ff)
wrongly set `local[*]` at `spark.master` instead of `local[1]`;
```
// Performance results on r3.xlarge
// --master local[1] --driver-memory 10G --conf spark.ui.enabled=false
OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)
Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 9292 / 9315 1.7
590.8 1.0X
Parquet Vectorized (Pushdown) 921 / 933 17.1
58.6 10.1X
Native ORC Vectorized 9001 / 9021 1.7
572.3 1.0X
Native ORC Vectorized (Pushdown) 257 / 265 61.2
16.3 36.2X
Select 1 string row (value = '7864320'): Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 9151 / 9162 1.7
581.8 1.0X
Parquet Vectorized (Pushdown) 902 / 917 17.4
57.3 10.1X
Native ORC Vectorized 8870 / 8882 1.8
564.0 1.0X
Native ORC Vectorized (Pushdown) 254 / 268 61.9
16.1 36.0X
...
// --master local[*] --driver-memory 10G --conf spark.ui.enabled=false
OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 3959 / 4067 4.0
251.7 1.0X
Parquet Vectorized (Pushdown) 202 / 245 77.7
12.9 19.6X
Native ORC Vectorized 3973 / 4055 4.0
252.6 1.0X
Native ORC Vectorized (Pushdown) 286 / 345 55.0
18.2 13.8X
OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)
Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 3985 / 4022 3.9
253.4 1.0X
Parquet Vectorized (Pushdown) 249 / 274 63.3
15.8 16.0X
Native ORC Vectorized 4066 / 4122 3.9
258.5 1.0X
Native ORC Vectorized (Pushdown) 257 / 310 61.3
16.3 15.5X
```
I'll fix the bug and update the results in following prs. Sorry, all.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]