dongjoon-hyun commented on a change in pull request #24068: [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion URL: https://github.com/apache/spark/pull/24068#discussion_r293216015
########## File path: sql/core/benchmarks/FilterPushdownBenchmark-results.txt ########## @@ -2,669 +2,695 @@ Pushdown for many distinct value case ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 11405 / 11485 1.4 725.1 1.0X -Parquet Vectorized (Pushdown) 675 / 690 23.3 42.9 16.9X -Native ORC Vectorized 7127 / 7170 2.2 453.1 1.6X -Native ORC Vectorized (Pushdown) 519 / 541 30.3 33.0 22.0X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 11457 / 11473 1.4 728.4 1.0X -Parquet Vectorized (Pushdown) 656 / 686 24.0 41.7 17.5X -Native ORC Vectorized 7328 / 7342 2.1 465.9 1.6X -Native ORC Vectorized (Pushdown) 539 / 565 29.2 34.2 21.3X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 1 string row (value = '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 11878 / 11888 1.3 755.2 1.0X -Parquet Vectorized (Pushdown) 630 / 654 25.0 40.1 18.9X -Native ORC Vectorized 7342 / 7362 2.1 466.8 1.6X -Native ORC Vectorized (Pushdown) 519 / 537 30.3 33.0 22.9X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 1 string row (value <=> '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 11423 / 11440 1.4 726.2 1.0X -Parquet Vectorized (Pushdown) 625 / 643 25.2 39.7 18.3X -Native ORC Vectorized 7315 / 7335 2.2 465.1 1.6X -Native ORC Vectorized (Pushdown) 507 / 520 31.0 32.2 22.5X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 1 string row ('7864320' <= value <= '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 11440 / 11478 1.4 727.3 1.0X -Parquet Vectorized (Pushdown) 634 / 652 24.8 40.3 18.0X -Native ORC Vectorized 7311 / 7324 2.2 464.8 1.6X -Native ORC Vectorized (Pushdown) 517 / 548 30.4 32.8 22.1X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select all string rows (value IS NOT NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 20750 / 20872 0.8 1319.3 1.0X -Parquet Vectorized (Pushdown) 21002 / 21032 0.7 1335.3 1.0X -Native ORC Vectorized 16714 / 16742 0.9 1062.6 1.2X -Native ORC Vectorized (Pushdown) 16926 / 16965 0.9 1076.1 1.2X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 0 int row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10510 / 10532 1.5 668.2 1.0X -Parquet Vectorized (Pushdown) 642 / 665 24.5 40.8 16.4X -Native ORC Vectorized 6609 / 6618 2.4 420.2 1.6X -Native ORC Vectorized (Pushdown) 502 / 512 31.4 31.9 21.0X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 0 int row (7864320 < value < 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10505 / 10514 1.5 667.9 1.0X -Parquet Vectorized (Pushdown) 659 / 673 23.9 41.9 15.9X -Native ORC Vectorized 6634 / 6641 2.4 421.8 1.6X -Native ORC Vectorized (Pushdown) 513 / 526 30.7 32.6 20.5X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 1 int row (value = 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10555 / 10570 1.5 671.1 1.0X -Parquet Vectorized (Pushdown) 651 / 668 24.2 41.4 16.2X -Native ORC Vectorized 6721 / 6728 2.3 427.3 1.6X -Native ORC Vectorized (Pushdown) 508 / 519 31.0 32.3 20.8X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 1 int row (value <=> 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10556 / 10566 1.5 671.1 1.0X -Parquet Vectorized (Pushdown) 647 / 654 24.3 41.1 16.3X -Native ORC Vectorized 6716 / 6728 2.3 427.0 1.6X -Native ORC Vectorized (Pushdown) 510 / 521 30.9 32.4 20.7X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 1 int row (7864320 <= value <= 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10556 / 10565 1.5 671.1 1.0X -Parquet Vectorized (Pushdown) 649 / 654 24.2 41.3 16.3X -Native ORC Vectorized 6700 / 6712 2.3 426.0 1.6X -Native ORC Vectorized (Pushdown) 509 / 520 30.9 32.3 20.8X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 1 int row (7864319 < value < 7864321): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10547 / 10566 1.5 670.5 1.0X -Parquet Vectorized (Pushdown) 649 / 653 24.2 41.3 16.3X -Native ORC Vectorized 6703 / 6713 2.3 426.2 1.6X -Native ORC Vectorized (Pushdown) 510 / 520 30.8 32.5 20.7X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 10% int rows (value < 1572864): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 11478 / 11525 1.4 729.7 1.0X -Parquet Vectorized (Pushdown) 2576 / 2587 6.1 163.8 4.5X -Native ORC Vectorized 7633 / 7657 2.1 485.3 1.5X -Native ORC Vectorized (Pushdown) 2076 / 2096 7.6 132.0 5.5X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 50% int rows (value < 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 14785 / 14802 1.1 940.0 1.0X -Parquet Vectorized (Pushdown) 9971 / 9977 1.6 633.9 1.5X -Native ORC Vectorized 11082 / 11107 1.4 704.6 1.3X -Native ORC Vectorized (Pushdown) 8061 / 8073 2.0 512.5 1.8X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 90% int rows (value < 14155776): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 18174 / 18214 0.9 1155.5 1.0X -Parquet Vectorized (Pushdown) 17387 / 17403 0.9 1105.5 1.0X -Native ORC Vectorized 14465 / 14492 1.1 919.7 1.3X -Native ORC Vectorized (Pushdown) 14024 / 14041 1.1 891.6 1.3X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select all int rows (value IS NOT NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 19004 / 19014 0.8 1208.2 1.0X -Parquet Vectorized (Pushdown) 19219 / 19232 0.8 1221.9 1.0X -Native ORC Vectorized 15266 / 15290 1.0 970.6 1.2X -Native ORC Vectorized (Pushdown) 15469 / 15482 1.0 983.5 1.2X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select all int rows (value > -1): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 19036 / 19052 0.8 1210.3 1.0X -Parquet Vectorized (Pushdown) 19287 / 19306 0.8 1226.2 1.0X -Native ORC Vectorized 15311 / 15371 1.0 973.5 1.2X -Native ORC Vectorized (Pushdown) 15517 / 15590 1.0 986.5 1.2X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select all int rows (value != -1): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 19072 / 19102 0.8 1212.6 1.0X -Parquet Vectorized (Pushdown) 19288 / 19318 0.8 1226.3 1.0X -Native ORC Vectorized 15277 / 15293 1.0 971.3 1.2X -Native ORC Vectorized (Pushdown) 15479 / 15499 1.0 984.1 1.2X +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 0 string row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 6509 6563 64 2.4 413.8 1.0X +Parquet Vectorized (Pushdown) 451 455 5 34.9 28.7 14.4X +Native ORC Vectorized 4697 4880 311 3.3 298.6 1.4X +Native ORC Vectorized (Pushdown) 572 585 12 27.5 36.3 11.4X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 0 string row ('7864320' < value < '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 6597 6624 21 2.4 419.4 1.0X +Parquet Vectorized (Pushdown) 453 456 2 34.7 28.8 14.6X +Native ORC Vectorized 4853 4887 29 3.2 308.5 1.4X +Native ORC Vectorized (Pushdown) 572 582 13 27.5 36.3 11.5X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 1 string row (value = '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 6597 6648 30 2.4 419.4 1.0X +Parquet Vectorized (Pushdown) 445 448 3 35.4 28.3 14.8X +Native ORC Vectorized 4915 4954 34 3.2 312.5 1.3X +Native ORC Vectorized (Pushdown) 560 574 14 28.1 35.6 11.8X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 1 string row (value <=> '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 6598 6664 80 2.4 419.5 1.0X +Parquet Vectorized (Pushdown) 439 442 3 35.8 27.9 15.0X +Native ORC Vectorized 4894 4926 30 3.2 311.1 1.3X +Native ORC Vectorized (Pushdown) 561 572 13 28.0 35.7 11.8X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 1 string row ('7864320' <= value <= '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 6607 6634 27 2.4 420.1 1.0X +Parquet Vectorized (Pushdown) 440 444 3 35.8 28.0 15.0X +Native ORC Vectorized 4910 4961 48 3.2 312.2 1.3X +Native ORC Vectorized (Pushdown) 564 575 13 27.9 35.9 11.7X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select all string rows (value IS NOT NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 12348 12442 63 1.3 785.1 1.0X +Parquet Vectorized (Pushdown) 12110 12211 96 1.3 769.9 1.0X +Native ORC Vectorized 10689 10772 59 1.5 679.6 1.2X +Native ORC Vectorized (Pushdown) 10926 10971 40 1.4 694.7 1.1X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 0 int row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 5789 5870 64 2.7 368.1 1.0X +Parquet Vectorized (Pushdown) 356 361 3 44.2 22.6 16.3X +Native ORC Vectorized 4326 4515 303 3.6 275.1 1.3X +Native ORC Vectorized (Pushdown) 547 565 15 28.8 34.8 10.6X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 0 int row (7864320 < value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 5791 5837 60 2.7 368.2 1.0X +Parquet Vectorized (Pushdown) 364 373 6 43.2 23.2 15.9X +Native ORC Vectorized 4359 4398 28 3.6 277.1 1.3X +Native ORC Vectorized (Pushdown) 555 569 16 28.3 35.3 10.4X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 1 int row (value = 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 5876 5916 25 2.7 373.6 1.0X +Parquet Vectorized (Pushdown) 362 367 4 43.4 23.0 16.2X +Native ORC Vectorized 4393 4453 44 3.6 279.3 1.3X +Native ORC Vectorized (Pushdown) 552 567 16 28.5 35.1 10.6X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 1 int row (value <=> 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 5879 5898 16 2.7 373.8 1.0X +Parquet Vectorized (Pushdown) 359 369 9 43.8 22.8 16.4X +Native ORC Vectorized 4405 4441 30 3.6 280.0 1.3X +Native ORC Vectorized (Pushdown) 548 564 19 28.7 34.8 10.7X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 1 int row (7864320 <= value <= 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 5846 5930 75 2.7 371.7 1.0X +Parquet Vectorized (Pushdown) 363 372 6 43.4 23.1 16.1X +Native ORC Vectorized 4425 4456 23 3.6 281.3 1.3X +Native ORC Vectorized (Pushdown) 551 572 24 28.6 35.0 10.6X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 1 int row (7864319 < value < 7864321): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 5835 5888 43 2.7 370.9 1.0X +Parquet Vectorized (Pushdown) 363 368 3 43.3 23.1 16.1X +Native ORC Vectorized 4426 4445 24 3.6 281.4 1.3X +Native ORC Vectorized (Pushdown) 547 563 16 28.7 34.8 10.7X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 10% int rows (value < 1572864): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 6470 6540 48 2.4 411.4 1.0X +Parquet Vectorized (Pushdown) 1548 1570 16 10.2 98.4 4.2X +Native ORC Vectorized 5078 5106 22 3.1 322.9 1.3X +Native ORC Vectorized (Pushdown) 1625 1641 11 9.7 103.3 4.0X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 50% int rows (value < 7864320): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 8568 8628 45 1.8 544.7 1.0X +Parquet Vectorized (Pushdown) 5826 5891 54 2.7 370.4 1.5X +Native ORC Vectorized 7233 7254 18 2.2 459.8 1.2X +Native ORC Vectorized (Pushdown) 5447 5481 31 2.9 346.3 1.6X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 90% int rows (value < 14155776): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 10653 10701 48 1.5 677.3 1.0X +Parquet Vectorized (Pushdown) 10210 10244 40 1.5 649.1 1.0X +Native ORC Vectorized 9398 9441 32 1.7 597.5 1.1X +Native ORC Vectorized (Pushdown) 9271 9331 56 1.7 589.4 1.1X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select all int rows (value IS NOT NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 11098 11146 39 1.4 705.6 1.0X +Parquet Vectorized (Pushdown) 11187 11254 45 1.4 711.2 1.0X +Native ORC Vectorized 9847 9895 43 1.6 626.0 1.1X +Native ORC Vectorized (Pushdown) 10227 12071 623 1.5 650.2 1.1X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select all int rows (value > -1): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 11873 14239 NaN 1.3 754.8 1.0X +Parquet Vectorized (Pushdown) 11854 11911 36 1.3 753.7 1.0X +Native ORC Vectorized 10197 10482 397 1.5 648.3 1.2X +Native ORC Vectorized (Pushdown) 10450 10471 16 1.5 664.4 1.1X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select all int rows (value != -1): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 11715 11776 53 1.3 744.8 1.0X +Parquet Vectorized (Pushdown) 12178 15502 NaN 1.3 774.2 1.0X +Native ORC Vectorized 10196 10256 62 1.5 648.2 1.1X +Native ORC Vectorized (Pushdown) 10448 10479 21 1.5 664.3 1.1X ================================================================================================ Pushdown for few distinct value case (use dictionary encoding) ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 0 distinct string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10250 / 10274 1.5 651.7 1.0X -Parquet Vectorized (Pushdown) 571 / 576 27.5 36.3 17.9X -Native ORC Vectorized 8651 / 8660 1.8 550.0 1.2X -Native ORC Vectorized (Pushdown) 909 / 933 17.3 57.8 11.3X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 0 distinct string row ('100' < value < '100'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10420 / 10426 1.5 662.5 1.0X -Parquet Vectorized (Pushdown) 574 / 579 27.4 36.5 18.2X -Native ORC Vectorized 8973 / 8982 1.8 570.5 1.2X -Native ORC Vectorized (Pushdown) 916 / 955 17.2 58.2 11.4X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 1 distinct string row (value = '100'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10428 / 10441 1.5 663.0 1.0X -Parquet Vectorized (Pushdown) 789 / 809 19.9 50.2 13.2X -Native ORC Vectorized 9042 / 9055 1.7 574.9 1.2X -Native ORC Vectorized (Pushdown) 1130 / 1145 13.9 71.8 9.2X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 1 distinct string row (value <=> '100'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10402 / 10416 1.5 661.3 1.0X -Parquet Vectorized (Pushdown) 791 / 806 19.9 50.3 13.2X -Native ORC Vectorized 9042 / 9055 1.7 574.9 1.2X -Native ORC Vectorized (Pushdown) 1112 / 1145 14.1 70.7 9.4X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select 1 distinct string row ('100' <= value <= '100'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 10548 / 10563 1.5 670.6 1.0X -Parquet Vectorized (Pushdown) 790 / 796 19.9 50.2 13.4X -Native ORC Vectorized 9144 / 9153 1.7 581.3 1.2X -Native ORC Vectorized (Pushdown) 1117 / 1148 14.1 71.0 9.4X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Select all distinct string rows (value IS NOT NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Parquet Vectorized 20445 / 20469 0.8 1299.8 1.0X -Parquet Vectorized (Pushdown) 20686 / 20699 0.8 1315.2 1.0X -Native ORC Vectorized 18851 / 18953 0.8 1198.5 1.1X -Native ORC Vectorized (Pushdown) 19255 / 19268 0.8 1224.2 1.1X +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 0 distinct string row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 5997 6029 23 2.6 381.3 1.0X +Parquet Vectorized (Pushdown) 328 336 7 47.9 20.9 18.3X +Native ORC Vectorized 5886 6011 109 2.7 374.2 1.0X +Native ORC Vectorized (Pushdown) 1086 1111 22 14.5 69.1 5.5X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 0 distinct string row ('100' < value < '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 6097 6139 45 2.6 387.6 1.0X +Parquet Vectorized (Pushdown) 331 342 6 47.5 21.1 18.4X +Native ORC Vectorized 6018 6070 33 2.6 382.6 1.0X +Native ORC Vectorized (Pushdown) 1084 1099 14 14.5 68.9 5.6X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 1 distinct string row (value = '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 6175 6202 26 2.5 392.6 1.0X +Parquet Vectorized (Pushdown) 474 488 10 33.2 30.1 13.0X +Native ORC Vectorized 6236 6270 41 2.5 396.5 1.0X +Native ORC Vectorized (Pushdown) 1203 1226 18 13.1 76.5 5.1X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 1 distinct string row (value <=> '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 6192 7882 704 2.5 393.7 1.0X +Parquet Vectorized (Pushdown) 511 769 265 30.8 32.5 12.1X +Native ORC Vectorized 6592 7214 441 2.4 419.1 0.9X +Native ORC Vectorized (Pushdown) 1306 1446 124 12.0 83.0 4.7X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14.2 +Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz +Select 1 distinct string row ('100' <= value <= '100'): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Parquet Vectorized 6708 7325 686 2.3 426.5 1.0X Review comment: The laptop was stable here? Any other jobs? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org