sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-683451979
> Parquet reader is performance-wise important component in Spark SQL. We
better to make sure no performance regression due to this change. Should we run
a benchmark to check it?
@viirya So I used `FilterPushdownBenchmark` for this and I don't see much
difference. Taking the first few:
Before:
```
[info] OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
[info] Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
[info] Select 0 string row ('7864320' < value < '7864320'): Best Time(ms)
Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
-----------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 5632
5752 120 2.8 358.1 1.0X
[info] Parquet Vectorized (Pushdown) 491
506 18 32.0 31.2 11.5X
[info] Native ORC Vectorized 4300
4335 25 3.7 273.4 1.3X
[info] Native ORC Vectorized (Pushdown) 525
530 6 30.0 33.4 10.7X
[info] OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
[info] Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
[info] Select 1 string row (value = '7864320'): Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 5594
5757 101 2.8 355.7 1.0X
[info] Parquet Vectorized (Pushdown) 472
491 14 33.3 30.0 11.9X
[info] Native ORC Vectorized 4320
4387 42 3.6 274.7 1.3X
[info] Native ORC Vectorized (Pushdown) 512
524 8 30.7 32.6 10.9X
```
After:
```
[info] OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
[info] Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
[info] Select 0 string row ('7864320' < value < '7864320'): Best Time(ms)
Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
-----------------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 5539
5635 87 2.8 352.1 1.0X
[info] Parquet Vectorized (Pushdown) 456
461 6 34.5 29.0 12.1X
[info] Native ORC Vectorized 4243
4282 35 3.7 269.8 1.3X
[info] Native ORC Vectorized (Pushdown) 511
523 13 30.8 32.5 10.8X
[info] OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
[info] Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
[info] Select 1 string row (value = '7864320'): Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
------------------------------------------------------------------------------------------------------------------------
[info] Parquet Vectorized 5509
5576 55 2.9 350.2 1.0X
[info] Parquet Vectorized (Pushdown) 462
475 8 34.1 29.4 11.9X
[info] Native ORC Vectorized 4230
4294 44 3.7 268.9 1.3X
[info] Native ORC Vectorized (Pushdown) 501
510 11 31.4 31.8 11.0X
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]