Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21070
@maropu, are you sure about the INT and FLOAT columns? I think you might
have that assessment backwards. Here's the INT results from the PR gist:
```
SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
SQL Parquet Vectorized 149 / 162 105.5
9.5 1.0X
SQL Parquet MR 1825 / 1836 8.6
116.1 0.1X
```
And here are the INT results from the master gist:
```
SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
SQL Parquet Vectorized 250 / 292 63.0
15.9 1.0X
SQL Parquet MR 3175 / 3202 5.0
201.8 0.1X
```
I think that shows that the PR result was significantly faster, not slower.
(The other INT test was about the same.)
Here's the FLOAT column from the PR gist:
```
SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
SQL Parquet Vectorized 145 / 158 108.8
9.2 1.0X
SQL Parquet MR 1840 / 1843 8.5
117.0 0.1X
```
And FLOAT from the master gist:
```
SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
SQL Parquet Vectorized 261 / 316 60.2
16.6 1.0X
SQL Parquet MR 3267 / 3284 4.8
207.7 0.1X
```
Am I reading this incorrectly? I'm considering lower time values and higher
rate values to be better.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]