Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/21603
Benchmark result:
```
##########################[ Pushdown benchmark for InSet -> InFilters
]##########################
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
InSet -> InFilters (threshold: 10, values count: 5, distribution: 10):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7649 / 7678 2.1
486.3 1.0X
Parquet Vectorized (Pushdown) 316 / 325 49.8
20.1 24.2X
Native ORC Vectorized 6787 / 7353 2.3
431.5 1.1X
Native ORC Vectorized (Pushdown) 1010 / 1020 15.6
64.2 7.6X
InSet -> InFilters (threshold: 10, values count: 5, distribution: 50):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7537 / 7944 2.1
479.2 1.0X
Parquet Vectorized (Pushdown) 297 / 306 52.9
18.9 25.3X
Native ORC Vectorized 6768 / 6779 2.3
430.3 1.1X
Native ORC Vectorized (Pushdown) 998 / 1017 15.8
63.4 7.6X
InSet -> InFilters (threshold: 10, values count: 5, distribution: 90):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7500 / 7592 2.1
476.8 1.0X
Parquet Vectorized (Pushdown) 299 / 306 52.5
19.0 25.1X
Native ORC Vectorized 6758 / 6797 2.3
429.7 1.1X
Native ORC Vectorized (Pushdown) 982 / 993 16.0
62.4 7.6X
InSet -> InFilters (threshold: 10, values count: 10, distribution: 10):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7566 / 8153 2.1
481.1 1.0X
Parquet Vectorized (Pushdown) 319 / 328 49.3
20.3 23.7X
Native ORC Vectorized 6761 / 6812 2.3
429.8 1.1X
Native ORC Vectorized (Pushdown) 995 / 1013 15.8
63.3 7.6X
InSet -> InFilters (threshold: 10, values count: 10, distribution: 50):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7512 / 7581 2.1
477.6 1.0X
Parquet Vectorized (Pushdown) 315 / 322 50.0
20.0 23.9X
Native ORC Vectorized 6712 / 6774 2.3
426.8 1.1X
Native ORC Vectorized (Pushdown) 1001 / 1032 15.7
63.6 7.5X
InSet -> InFilters (threshold: 10, values count: 10, distribution: 90):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7603 / 7689 2.1
483.4 1.0X
Parquet Vectorized (Pushdown) 308 / 317 51.0
19.6 24.7X
Native ORC Vectorized 7011 / 7605 2.2
445.7 1.1X
Native ORC Vectorized (Pushdown) 1038 / 1067 15.2
66.0 7.3X
InSet -> InFilters (threshold: 10, values count: 50, distribution: 10):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7750 / 7796 2.0
492.7 1.0X
Parquet Vectorized (Pushdown) 7855 / 7961 2.0
499.4 1.0X
Native ORC Vectorized 7120 / 7820 2.2
452.7 1.1X
Native ORC Vectorized (Pushdown) 1085 / 1122 14.5
69.0 7.1X
InSet -> InFilters (threshold: 10, values count: 50, distribution: 50):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7920 / 8012 2.0
503.5 1.0X
Parquet Vectorized (Pushdown) 7855 / 8159 2.0
499.4 1.0X
Native ORC Vectorized 7087 / 7105 2.2
450.6 1.1X
Native ORC Vectorized (Pushdown) 1098 / 1118 14.3
69.8 7.2X
InSet -> InFilters (threshold: 10, values count: 50, distribution: 90):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7809 / 7918 2.0
496.5 1.0X
Parquet Vectorized (Pushdown) 7800 / 7857 2.0
495.9 1.0X
Native ORC Vectorized 7089 / 7145 2.2
450.7 1.1X
Native ORC Vectorized (Pushdown) 1102 / 1123 14.3
70.1 7.1X
InSet -> InFilters (threshold: 10, values count: 100, distribution: 10):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7793 / 7823 2.0
495.5 1.0X
Parquet Vectorized (Pushdown) 7765 / 7863 2.0
493.7 1.0X
Native ORC Vectorized 7066 / 7175 2.2
449.2 1.1X
Native ORC Vectorized (Pushdown) 1194 / 1210 13.2
75.9 6.5X
InSet -> InFilters (threshold: 10, values count: 100, distribution: 50):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7782 / 7816 2.0
494.8 1.0X
Parquet Vectorized (Pushdown) 7737 / 7782 2.0
491.9 1.0X
Native ORC Vectorized 7056 / 7100 2.2
448.6 1.1X
Native ORC Vectorized (Pushdown) 1193 / 1264 13.2
75.9 6.5X
InSet -> InFilters (threshold: 10, values count: 100, distribution: 90):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7726 / 8463 2.0
491.2 1.0X
Parquet Vectorized (Pushdown) 8759 / 9317 1.8
556.9 0.9X
Native ORC Vectorized 7067 / 7379 2.2
449.3 1.1X
Native ORC Vectorized (Pushdown) 1352 / 1520 11.6
86.0 5.7X
InSet -> InFilters (threshold: 100, values count: 5, distribution: 10):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 8694 / 10591 1.8
552.7 1.0X
Parquet Vectorized (Pushdown) 288 / 313 54.5
18.3 30.1X
Native ORC Vectorized 6898 / 7754 2.3
438.6 1.3X
Native ORC Vectorized (Pushdown) 1037 / 1279 15.2
65.9 8.4X
InSet -> InFilters (threshold: 100, values count: 5, distribution: 50):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7584 / 8641 2.1
482.2 1.0X
Parquet Vectorized (Pushdown) 293 / 299 53.7
18.6 25.9X
Native ORC Vectorized 6849 / 6918 2.3
435.5 1.1X
Native ORC Vectorized (Pushdown) 996 / 1020 15.8
63.3 7.6X
InSet -> InFilters (threshold: 100, values count: 5, distribution: 90):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7617 / 7947 2.1
484.3 1.0X
Parquet Vectorized (Pushdown) 311 / 341 50.5
19.8 24.5X
Native ORC Vectorized 7468 / 8006 2.1
474.8 1.0X
Native ORC Vectorized (Pushdown) 1095 / 1173 14.4
69.6 7.0X
InSet -> InFilters (threshold: 100, values count: 10, distribution: 10):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 8364 / 9682 1.9
531.8 1.0X
Parquet Vectorized (Pushdown) 325 / 498 48.4
20.7 25.7X
Native ORC Vectorized 6931 / 7797 2.3
440.7 1.2X
Native ORC Vectorized (Pushdown) 1010 / 1032 15.6
64.2 8.3X
InSet -> InFilters (threshold: 100, values count: 10, distribution: 50):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7647 / 8096 2.1
486.2 1.0X
Parquet Vectorized (Pushdown) 315 / 409 49.9
20.1 24.2X
Native ORC Vectorized 6839 / 7307 2.3
434.8 1.1X
Native ORC Vectorized (Pushdown) 1033 / 1077 15.2
65.7 7.4X
InSet -> InFilters (threshold: 100, values count: 10, distribution: 90):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7653 / 8725 2.1
486.6 1.0X
Parquet Vectorized (Pushdown) 319 / 367 49.3
20.3 24.0X
Native ORC Vectorized 7121 / 8047 2.2
452.7 1.1X
Native ORC Vectorized (Pushdown) 1066 / 1133 14.8
67.8 7.2X
InSet -> InFilters (threshold: 100, values count: 50, distribution: 10):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7804 / 8926 2.0
496.2 1.0X
Parquet Vectorized (Pushdown) 476 / 568 33.0
30.3 16.4X
Native ORC Vectorized 7891 / 8248 2.0
501.7 1.0X
Native ORC Vectorized (Pushdown) 1158 / 1195 13.6
73.6 6.7X
InSet -> InFilters (threshold: 100, values count: 50, distribution: 50):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 8576 / 9488 1.8
545.2 1.0X
Parquet Vectorized (Pushdown) 522 / 593 30.1
33.2 16.4X
Native ORC Vectorized 7199 / 7692 2.2
457.7 1.2X
Native ORC Vectorized (Pushdown) 1180 / 1280 13.3
75.0 7.3X
InSet -> InFilters (threshold: 100, values count: 50, distribution: 90):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 9142 / 10012 1.7
581.2 1.0X
Parquet Vectorized (Pushdown) 536 / 620 29.3
34.1 17.0X
Native ORC Vectorized 7720 / 9655 2.0
490.9 1.2X
Native ORC Vectorized (Pushdown) 1110 / 1212 14.2
70.6 8.2X
InSet -> InFilters (threshold: 100, values count: 100, distribution: 10):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 8478 / 9150 1.9
539.0 1.0X
Parquet Vectorized (Pushdown) 700 / 900 22.5
44.5 12.1X
Native ORC Vectorized 7427 / 8069 2.1
472.2 1.1X
Native ORC Vectorized (Pushdown) 1185 / 1633 13.3
75.3 7.2X
InSet -> InFilters (threshold: 100, values count: 100, distribution: 50):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7919 / 9670 2.0
503.5 1.0X
Parquet Vectorized (Pushdown) 731 / 750 21.5
46.5 10.8X
Native ORC Vectorized 7205 / 7306 2.2
458.1 1.1X
Native ORC Vectorized (Pushdown) 1191 / 1224 13.2
75.7 6.6X
InSet -> InFilters (threshold: 100, values count: 100, distribution: 90):
Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 7845 / 8146 2.0
498.8 1.0X
Parquet Vectorized (Pushdown) 761 / 838 20.7
48.4 10.3X
Native ORC Vectorized 7081 / 7741 2.2
450.2 1.1X
Native ORC Vectorized (Pushdown) 1289 / 1459 12.2
82.0 6.1X
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]