Github user SongYadong commented on the issue:
https://github.com/apache/spark/pull/22348
@dongjoon-hyun . You are right, DataSourceReadBenchmark result show the
benefit is too small even in some cases is covered up by fluctuation.
Java HotSpot(TM) 64-Bit Server VM 1.8.0_141-b15 on Windows 7 6.1
Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
Before:
```
Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 330 / 334 47.7
21.0 1.0X
ParquetReader Vectorized -> Row 213 / 301 73.7
13.6 1.5X
```
After:
```
Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 292 / 366 53.8
18.6 1.0X
ParquetReader Vectorized -> Row 254 / 286 62.0
16.1 1.2X
```
Before:
```
Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 391 / 425 40.2
24.9 1.0X
ParquetReader Vectorized -> Row 371 / 407 42.4
23.6 1.1X
```
After:
```
Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 435 / 485 36.1
27.7 1.0X
ParquetReader Vectorized -> Row 398 / 440 39.5
25.3 1.1X
```
Before:
```
Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 453 / 516 34.7
28.8 1.0X
ParquetReader Vectorized -> Row 542 / 563 29.0
34.5 0.8X
```
After:
```
Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 533 / 602 29.5
33.9 1.0X
ParquetReader Vectorized -> Row 549 / 570 28.6
34.9 1.0X
```
Before:
```
Parquet Reader Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 800 / 817 19.7
50.9 1.0X
ParquetReader Vectorized -> Row 530 / 686 29.7
33.7 1.5X
```
After:
```
Parquet Reader Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 692 / 847 22.7
44.0 1.0X
ParquetReader Vectorized -> Row 580 / 610 27.1
36.9 1.2X
```
Before:
```
Parquet Reader Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 467 / 543 33.7
29.7 1.0X
ParquetReader Vectorized -> Row 457 / 507 34.4
29.1 1.0X
```
After:
```
Parquet Reader Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 584 / 600 26.9
37.1 1.0X
ParquetReader Vectorized -> Row 546 / 555 28.8
34.7 1.1X
```
Before:
```
Parquet Reader Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 801 / 808 19.6
50.9 1.0X
ParquetReader Vectorized -> Row 590 / 668 26.7
37.5 1.4X
```
After:
```
Parquet Reader Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 833 / 858 18.9
53.0 1.0X
ParquetReader Vectorized -> Row 672 / 722 23.4
42.7 1.2X
```
Before:
```
String with Nulls Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 2098 / 2263 5.0
200.1 13.3X
```
After:
```
String with Nulls Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 1943 / 2084 5.4
185.3 14.0X
```
Before:
```
String with Nulls Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 1930 / 1980 5.4
184.0 15.0X
```
After:
```
String with Nulls Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 1873 / 1875 5.6
178.6 16.7X
```
Before:
```
String with Nulls Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 301 / 356 34.9
28.7 83.1X
```
After:
```
String with Nulls Scan: Best/Avg Time(ms) Rate(M/s)
Per Row(ns) Relative
------------------------------------------------------------------------------------------------
ParquetReader Vectorized 506 / 542 20.7
48.3 53.0X
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]