dongjoon-hyun commented on a change in pull request #24068: [SPARK-27105][SQL] 
Optimize away exponential complexity in ORC predicate conversion
URL: https://github.com/apache/spark/pull/24068#discussion_r293216015
 
 

 ##########
 File path: sql/core/benchmarks/FilterPushdownBenchmark-results.txt
 ##########
 @@ -2,669 +2,695 @@
 Pushdown for many distinct value case
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          11405 / 11485          1.4         
725.1       1.0X
-Parquet Vectorized (Pushdown)                  675 /  690         23.3         
 42.9      16.9X
-Native ORC Vectorized                         7127 / 7170          2.2         
453.1       1.6X
-Native ORC Vectorized (Pushdown)               519 /  541         30.3         
 33.0      22.0X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          11457 / 11473          1.4         
728.4       1.0X
-Parquet Vectorized (Pushdown)                  656 /  686         24.0         
 41.7      17.5X
-Native ORC Vectorized                         7328 / 7342          2.1         
465.9       1.6X
-Native ORC Vectorized (Pushdown)               539 /  565         29.2         
 34.2      21.3X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          11878 / 11888          1.3         
755.2       1.0X
-Parquet Vectorized (Pushdown)                  630 /  654         25.0         
 40.1      18.9X
-Native ORC Vectorized                         7342 / 7362          2.1         
466.8       1.6X
-Native ORC Vectorized (Pushdown)               519 /  537         30.3         
 33.0      22.9X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 1 string row (value <=> '7864320'): Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          11423 / 11440          1.4         
726.2       1.0X
-Parquet Vectorized (Pushdown)                  625 /  643         25.2         
 39.7      18.3X
-Native ORC Vectorized                         7315 / 7335          2.2         
465.1       1.6X
-Native ORC Vectorized (Pushdown)               507 /  520         31.0         
 32.2      22.5X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 1 string row ('7864320' <= value <= '7864320'): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          11440 / 11478          1.4         
727.3       1.0X
-Parquet Vectorized (Pushdown)                  634 /  652         24.8         
 40.3      18.0X
-Native ORC Vectorized                         7311 / 7324          2.2         
464.8       1.6X
-Native ORC Vectorized (Pushdown)               517 /  548         30.4         
 32.8      22.1X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select all string rows (value IS NOT NULL): Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          20750 / 20872          0.8        
1319.3       1.0X
-Parquet Vectorized (Pushdown)               21002 / 21032          0.7        
1335.3       1.0X
-Native ORC Vectorized                       16714 / 16742          0.9        
1062.6       1.2X
-Native ORC Vectorized (Pushdown)            16926 / 16965          0.9        
1076.1       1.2X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 0 int row (value IS NULL):        Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10510 / 10532          1.5         
668.2       1.0X
-Parquet Vectorized (Pushdown)                  642 /  665         24.5         
 40.8      16.4X
-Native ORC Vectorized                         6609 / 6618          2.4         
420.2       1.6X
-Native ORC Vectorized (Pushdown)               502 /  512         31.4         
 31.9      21.0X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 0 int row (7864320 < value < 7864320): Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10505 / 10514          1.5         
667.9       1.0X
-Parquet Vectorized (Pushdown)                  659 /  673         23.9         
 41.9      15.9X
-Native ORC Vectorized                         6634 / 6641          2.4         
421.8       1.6X
-Native ORC Vectorized (Pushdown)               513 /  526         30.7         
 32.6      20.5X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 1 int row (value = 7864320):      Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10555 / 10570          1.5         
671.1       1.0X
-Parquet Vectorized (Pushdown)                  651 /  668         24.2         
 41.4      16.2X
-Native ORC Vectorized                         6721 / 6728          2.3         
427.3       1.6X
-Native ORC Vectorized (Pushdown)               508 /  519         31.0         
 32.3      20.8X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 1 int row (value <=> 7864320):    Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10556 / 10566          1.5         
671.1       1.0X
-Parquet Vectorized (Pushdown)                  647 /  654         24.3         
 41.1      16.3X
-Native ORC Vectorized                         6716 / 6728          2.3         
427.0       1.6X
-Native ORC Vectorized (Pushdown)               510 /  521         30.9         
 32.4      20.7X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 1 int row (7864320 <= value <= 7864320): Best/Avg Time(ms)    Rate(M/s) 
  Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10556 / 10565          1.5         
671.1       1.0X
-Parquet Vectorized (Pushdown)                  649 /  654         24.2         
 41.3      16.3X
-Native ORC Vectorized                         6700 / 6712          2.3         
426.0       1.6X
-Native ORC Vectorized (Pushdown)               509 /  520         30.9         
 32.3      20.8X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 1 int row (7864319 < value < 7864321): Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10547 / 10566          1.5         
670.5       1.0X
-Parquet Vectorized (Pushdown)                  649 /  653         24.2         
 41.3      16.3X
-Native ORC Vectorized                         6703 / 6713          2.3         
426.2       1.6X
-Native ORC Vectorized (Pushdown)               510 /  520         30.8         
 32.5      20.7X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 10% int rows (value < 1572864):   Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          11478 / 11525          1.4         
729.7       1.0X
-Parquet Vectorized (Pushdown)                 2576 / 2587          6.1         
163.8       4.5X
-Native ORC Vectorized                         7633 / 7657          2.1         
485.3       1.5X
-Native ORC Vectorized (Pushdown)              2076 / 2096          7.6         
132.0       5.5X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 50% int rows (value < 7864320):   Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          14785 / 14802          1.1         
940.0       1.0X
-Parquet Vectorized (Pushdown)                 9971 / 9977          1.6         
633.9       1.5X
-Native ORC Vectorized                       11082 / 11107          1.4         
704.6       1.3X
-Native ORC Vectorized (Pushdown)              8061 / 8073          2.0         
512.5       1.8X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 90% int rows (value < 14155776):  Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          18174 / 18214          0.9        
1155.5       1.0X
-Parquet Vectorized (Pushdown)               17387 / 17403          0.9        
1105.5       1.0X
-Native ORC Vectorized                       14465 / 14492          1.1         
919.7       1.3X
-Native ORC Vectorized (Pushdown)            14024 / 14041          1.1         
891.6       1.3X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select all int rows (value IS NOT NULL): Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          19004 / 19014          0.8        
1208.2       1.0X
-Parquet Vectorized (Pushdown)               19219 / 19232          0.8        
1221.9       1.0X
-Native ORC Vectorized                       15266 / 15290          1.0         
970.6       1.2X
-Native ORC Vectorized (Pushdown)            15469 / 15482          1.0         
983.5       1.2X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select all int rows (value > -1):        Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          19036 / 19052          0.8        
1210.3       1.0X
-Parquet Vectorized (Pushdown)               19287 / 19306          0.8        
1226.2       1.0X
-Native ORC Vectorized                       15311 / 15371          1.0         
973.5       1.2X
-Native ORC Vectorized (Pushdown)            15517 / 15590          1.0         
986.5       1.2X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select all int rows (value != -1):       Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          19072 / 19102          0.8        
1212.6       1.0X
-Parquet Vectorized (Pushdown)               19288 / 19318          0.8        
1226.3       1.0X
-Native ORC Vectorized                       15277 / 15293          1.0         
971.3       1.2X
-Native ORC Vectorized (Pushdown)            15479 / 15499          1.0         
984.1       1.2X
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 0 string row (value IS NULL):      Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 6509           6563         
 64          2.4         413.8       1.0X
+Parquet Vectorized (Pushdown)                       451            455         
  5         34.9          28.7      14.4X
+Native ORC Vectorized                              4697           4880         
311          3.3         298.6       1.4X
+Native ORC Vectorized (Pushdown)                    572            585         
 12         27.5          36.3      11.4X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 0 string row ('7864320' < value < '7864320'):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 6597           6624         
 21          2.4         419.4       1.0X
+Parquet Vectorized (Pushdown)                       453            456         
  2         34.7          28.8      14.6X
+Native ORC Vectorized                              4853           4887         
 29          3.2         308.5       1.4X
+Native ORC Vectorized (Pushdown)                    572            582         
 13         27.5          36.3      11.5X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 1 string row (value = '7864320'):  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 6597           6648         
 30          2.4         419.4       1.0X
+Parquet Vectorized (Pushdown)                       445            448         
  3         35.4          28.3      14.8X
+Native ORC Vectorized                              4915           4954         
 34          3.2         312.5       1.3X
+Native ORC Vectorized (Pushdown)                    560            574         
 14         28.1          35.6      11.8X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 1 string row (value <=> '7864320'):  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 6598           6664         
 80          2.4         419.5       1.0X
+Parquet Vectorized (Pushdown)                       439            442         
  3         35.8          27.9      15.0X
+Native ORC Vectorized                              4894           4926         
 30          3.2         311.1       1.3X
+Native ORC Vectorized (Pushdown)                    561            572         
 13         28.0          35.7      11.8X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 1 string row ('7864320' <= value <= '7864320'):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 6607           6634         
 27          2.4         420.1       1.0X
+Parquet Vectorized (Pushdown)                       440            444         
  3         35.8          28.0      15.0X
+Native ORC Vectorized                              4910           4961         
 48          3.2         312.2       1.3X
+Native ORC Vectorized (Pushdown)                    564            575         
 13         27.9          35.9      11.7X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select all string rows (value IS NOT NULL):  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                12348          12442         
 63          1.3         785.1       1.0X
+Parquet Vectorized (Pushdown)                     12110          12211         
 96          1.3         769.9       1.0X
+Native ORC Vectorized                             10689          10772         
 59          1.5         679.6       1.2X
+Native ORC Vectorized (Pushdown)                  10926          10971         
 40          1.4         694.7       1.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 0 int row (value IS NULL):         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 5789           5870         
 64          2.7         368.1       1.0X
+Parquet Vectorized (Pushdown)                       356            361         
  3         44.2          22.6      16.3X
+Native ORC Vectorized                              4326           4515         
303          3.6         275.1       1.3X
+Native ORC Vectorized (Pushdown)                    547            565         
 15         28.8          34.8      10.6X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 0 int row (7864320 < value < 7864320):  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 5791           5837         
 60          2.7         368.2       1.0X
+Parquet Vectorized (Pushdown)                       364            373         
  6         43.2          23.2      15.9X
+Native ORC Vectorized                              4359           4398         
 28          3.6         277.1       1.3X
+Native ORC Vectorized (Pushdown)                    555            569         
 16         28.3          35.3      10.4X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 1 int row (value = 7864320):       Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 5876           5916         
 25          2.7         373.6       1.0X
+Parquet Vectorized (Pushdown)                       362            367         
  4         43.4          23.0      16.2X
+Native ORC Vectorized                              4393           4453         
 44          3.6         279.3       1.3X
+Native ORC Vectorized (Pushdown)                    552            567         
 16         28.5          35.1      10.6X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 1 int row (value <=> 7864320):     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 5879           5898         
 16          2.7         373.8       1.0X
+Parquet Vectorized (Pushdown)                       359            369         
  9         43.8          22.8      16.4X
+Native ORC Vectorized                              4405           4441         
 30          3.6         280.0       1.3X
+Native ORC Vectorized (Pushdown)                    548            564         
 19         28.7          34.8      10.7X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 1 int row (7864320 <= value <= 7864320):  Best Time(ms)   Avg Time(ms)  
 Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 5846           5930         
 75          2.7         371.7       1.0X
+Parquet Vectorized (Pushdown)                       363            372         
  6         43.4          23.1      16.1X
+Native ORC Vectorized                              4425           4456         
 23          3.6         281.3       1.3X
+Native ORC Vectorized (Pushdown)                    551            572         
 24         28.6          35.0      10.6X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 1 int row (7864319 < value < 7864321):  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 5835           5888         
 43          2.7         370.9       1.0X
+Parquet Vectorized (Pushdown)                       363            368         
  3         43.3          23.1      16.1X
+Native ORC Vectorized                              4426           4445         
 24          3.6         281.4       1.3X
+Native ORC Vectorized (Pushdown)                    547            563         
 16         28.7          34.8      10.7X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 10% int rows (value < 1572864):    Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 6470           6540         
 48          2.4         411.4       1.0X
+Parquet Vectorized (Pushdown)                      1548           1570         
 16         10.2          98.4       4.2X
+Native ORC Vectorized                              5078           5106         
 22          3.1         322.9       1.3X
+Native ORC Vectorized (Pushdown)                   1625           1641         
 11          9.7         103.3       4.0X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 50% int rows (value < 7864320):    Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 8568           8628         
 45          1.8         544.7       1.0X
+Parquet Vectorized (Pushdown)                      5826           5891         
 54          2.7         370.4       1.5X
+Native ORC Vectorized                              7233           7254         
 18          2.2         459.8       1.2X
+Native ORC Vectorized (Pushdown)                   5447           5481         
 31          2.9         346.3       1.6X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 90% int rows (value < 14155776):   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                10653          10701         
 48          1.5         677.3       1.0X
+Parquet Vectorized (Pushdown)                     10210          10244         
 40          1.5         649.1       1.0X
+Native ORC Vectorized                              9398           9441         
 32          1.7         597.5       1.1X
+Native ORC Vectorized (Pushdown)                   9271           9331         
 56          1.7         589.4       1.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select all int rows (value IS NOT NULL):  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                11098          11146         
 39          1.4         705.6       1.0X
+Parquet Vectorized (Pushdown)                     11187          11254         
 45          1.4         711.2       1.0X
+Native ORC Vectorized                              9847           9895         
 43          1.6         626.0       1.1X
+Native ORC Vectorized (Pushdown)                  10227          12071         
623          1.5         650.2       1.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select all int rows (value > -1):         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                11873          14239         
NaN          1.3         754.8       1.0X
+Parquet Vectorized (Pushdown)                     11854          11911         
 36          1.3         753.7       1.0X
+Native ORC Vectorized                             10197          10482         
397          1.5         648.3       1.2X
+Native ORC Vectorized (Pushdown)                  10450          10471         
 16          1.5         664.4       1.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select all int rows (value != -1):        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                11715          11776         
 53          1.3         744.8       1.0X
+Parquet Vectorized (Pushdown)                     12178          15502         
NaN          1.3         774.2       1.0X
+Native ORC Vectorized                             10196          10256         
 62          1.5         648.2       1.1X
+Native ORC Vectorized (Pushdown)                  10448          10479         
 21          1.5         664.3       1.1X
 
 
 
================================================================================================
 Pushdown for few distinct value case (use dictionary encoding)
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 0 distinct string row (value IS NULL): Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10250 / 10274          1.5         
651.7       1.0X
-Parquet Vectorized (Pushdown)                  571 /  576         27.5         
 36.3      17.9X
-Native ORC Vectorized                         8651 / 8660          1.8         
550.0       1.2X
-Native ORC Vectorized (Pushdown)               909 /  933         17.3         
 57.8      11.3X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 0 distinct string row ('100' < value < '100'): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10420 / 10426          1.5         
662.5       1.0X
-Parquet Vectorized (Pushdown)                  574 /  579         27.4         
 36.5      18.2X
-Native ORC Vectorized                         8973 / 8982          1.8         
570.5       1.2X
-Native ORC Vectorized (Pushdown)               916 /  955         17.2         
 58.2      11.4X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 1 distinct string row (value = '100'): Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10428 / 10441          1.5         
663.0       1.0X
-Parquet Vectorized (Pushdown)                  789 /  809         19.9         
 50.2      13.2X
-Native ORC Vectorized                         9042 / 9055          1.7         
574.9       1.2X
-Native ORC Vectorized (Pushdown)              1130 / 1145         13.9         
 71.8       9.2X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 1 distinct string row (value <=> '100'): Best/Avg Time(ms)    Rate(M/s) 
  Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10402 / 10416          1.5         
661.3       1.0X
-Parquet Vectorized (Pushdown)                  791 /  806         19.9         
 50.3      13.2X
-Native ORC Vectorized                         9042 / 9055          1.7         
574.9       1.2X
-Native ORC Vectorized (Pushdown)              1112 / 1145         14.1         
 70.7       9.4X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select 1 distinct string row ('100' <= value <= '100'): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          10548 / 10563          1.5         
670.6       1.0X
-Parquet Vectorized (Pushdown)                  790 /  796         19.9         
 50.2      13.4X
-Native ORC Vectorized                         9144 / 9153          1.7         
581.3       1.2X
-Native ORC Vectorized (Pushdown)              1117 / 1148         14.1         
 71.0       9.4X
-
-OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
-Select all distinct string rows (value IS NOT NULL): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Parquet Vectorized                          20445 / 20469          0.8        
1299.8       1.0X
-Parquet Vectorized (Pushdown)               20686 / 20699          0.8        
1315.2       1.0X
-Native ORC Vectorized                       18851 / 18953          0.8        
1198.5       1.1X
-Native ORC Vectorized (Pushdown)            19255 / 19268          0.8        
1224.2       1.1X
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 0 distinct string row (value IS NULL):  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 5997           6029         
 23          2.6         381.3       1.0X
+Parquet Vectorized (Pushdown)                       328            336         
  7         47.9          20.9      18.3X
+Native ORC Vectorized                              5886           6011         
109          2.7         374.2       1.0X
+Native ORC Vectorized (Pushdown)                   1086           1111         
 22         14.5          69.1       5.5X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 0 distinct string row ('100' < value < '100'):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 6097           6139         
 45          2.6         387.6       1.0X
+Parquet Vectorized (Pushdown)                       331            342         
  6         47.5          21.1      18.4X
+Native ORC Vectorized                              6018           6070         
 33          2.6         382.6       1.0X
+Native ORC Vectorized (Pushdown)                   1084           1099         
 14         14.5          68.9       5.6X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 1 distinct string row (value = '100'):  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 6175           6202         
 26          2.5         392.6       1.0X
+Parquet Vectorized (Pushdown)                       474            488         
 10         33.2          30.1      13.0X
+Native ORC Vectorized                              6236           6270         
 41          2.5         396.5       1.0X
+Native ORC Vectorized (Pushdown)                   1203           1226         
 18         13.1          76.5       5.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 1 distinct string row (value <=> '100'):  Best Time(ms)   Avg Time(ms)  
 Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 6192           7882         
704          2.5         393.7       1.0X
+Parquet Vectorized (Pushdown)                       511            769         
265         30.8          32.5      12.1X
+Native ORC Vectorized                              6592           7214         
441          2.4         419.1       0.9X
+Native ORC Vectorized (Pushdown)                   1306           1446         
124         12.0          83.0       4.7X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2
+Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
+Select 1 distinct string row ('100' <= value <= '100'):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Parquet Vectorized                                 6708           7325         
686          2.3         426.5       1.0X
 
 Review comment:
   The laptop was stable here? Any other jobs?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to