MaxGekk commented on a change in pull request #25828: [SPARK-29141][SQL][TEST] Use SqlBasedBenchmark in SQL benchmarks URL: https://github.com/apache/spark/pull/25828#discussion_r325994058
########## File path: sql/core/benchmarks/DataSourceReadBenchmark-results.txt ########## @@ -2,251 +2,251 @@ SQL Single Numeric Column Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -SQL CSV 26366 / 26562 0.6 1676.3 1.0X -SQL Json 8709 / 8724 1.8 553.7 3.0X -SQL Parquet Vectorized 166 / 187 94.8 10.5 159.0X -SQL Parquet MR 1706 / 1720 9.2 108.4 15.5X -SQL ORC Vectorized 167 / 174 94.2 10.6 157.9X -SQL ORC MR 1433 / 1465 11.0 91.1 18.4X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -ParquetReader Vectorized 200 / 207 78.7 12.7 1.0X -ParquetReader Vectorized -> Row 117 / 119 134.7 7.4 1.7X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -SQL CSV 26489 / 26547 0.6 1684.1 1.0X -SQL Json 8990 / 8998 1.7 571.5 2.9X -SQL Parquet Vectorized 209 / 221 75.1 13.3 126.5X -SQL Parquet MR 1949 / 1949 8.1 123.9 13.6X -SQL ORC Vectorized 221 / 228 71.3 14.0 120.1X -SQL ORC MR 1527 / 1549 10.3 97.1 17.3X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -ParquetReader Vectorized 286 / 296 54.9 18.2 1.0X -ParquetReader Vectorized -> Row 249 / 253 63.1 15.8 1.1X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -SQL CSV 27701 / 27744 0.6 1761.2 1.0X -SQL Json 9703 / 9733 1.6 616.9 2.9X -SQL Parquet Vectorized 176 / 182 89.2 11.2 157.0X -SQL Parquet MR 2164 / 2173 7.3 137.6 12.8X -SQL ORC Vectorized 307 / 314 51.2 19.5 90.2X -SQL ORC MR 1690 / 1700 9.3 107.4 16.4X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -ParquetReader Vectorized 259 / 277 60.7 16.5 1.0X -ParquetReader Vectorized -> Row 261 / 265 60.3 16.6 1.0X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -SQL CSV 34813 / 34900 0.5 2213.3 1.0X -SQL Json 12570 / 12617 1.3 799.2 2.8X -SQL Parquet Vectorized 270 / 308 58.2 17.2 128.9X -SQL Parquet MR 2427 / 2431 6.5 154.3 14.3X -SQL ORC Vectorized 388 / 398 40.6 24.6 89.8X -SQL ORC MR 1819 / 1851 8.6 115.7 19.1X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Parquet Reader Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -ParquetReader Vectorized 372 / 379 42.3 23.7 1.0X -ParquetReader Vectorized -> Row 357 / 368 44.1 22.7 1.0X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -SQL CSV 28753 / 28781 0.5 1828.0 1.0X -SQL Json 12039 / 12215 1.3 765.4 2.4X -SQL Parquet Vectorized 170 / 177 92.4 10.8 169.0X -SQL Parquet MR 2184 / 2196 7.2 138.9 13.2X -SQL ORC Vectorized 432 / 440 36.4 27.5 66.5X -SQL ORC MR 1812 / 1833 8.7 115.2 15.9X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Parquet Reader Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -ParquetReader Vectorized 253 / 260 62.2 16.1 1.0X -ParquetReader Vectorized -> Row 256 / 257 61.6 16.2 1.0X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -SQL Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -SQL CSV 36177 / 36188 0.4 2300.1 1.0X -SQL Json 18895 / 18898 0.8 1201.3 1.9X -SQL Parquet Vectorized 267 / 276 58.9 17.0 135.6X -SQL Parquet MR 2355 / 2363 6.7 149.7 15.4X -SQL ORC Vectorized 543 / 546 29.0 34.5 66.6X -SQL ORC MR 2246 / 2258 7.0 142.8 16.1X - -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Parquet Reader Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -ParquetReader Vectorized 353 / 367 44.6 22.4 1.0X -ParquetReader Vectorized -> Row 351 / 357 44.7 22.3 1.0X +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +SQL CSV 23939 24126 265 0.7 1522.0 1.0X +SQL Json 8908 9008 142 1.8 566.4 2.7X +SQL Parquet Vectorized 192 229 36 82.1 12.2 125.0X +SQL Parquet MR 2356 2363 10 6.7 149.8 10.2X +SQL ORC Vectorized 329 347 25 47.9 20.9 72.9X +SQL ORC MR 1711 1747 50 9.2 108.8 14.0X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 194 197 4 81.1 12.3 1.0X +ParquetReader Vectorized -> Row 97 102 13 162.3 6.2 2.0X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +SQL CSV 24603 24607 6 0.6 1564.2 1.0X +SQL Json 9587 9652 92 1.6 609.5 2.6X +SQL Parquet Vectorized 227 241 13 69.4 14.4 108.6X +SQL Parquet MR 2432 2441 12 6.5 154.6 10.1X +SQL ORC Vectorized 320 327 8 49.2 20.3 76.9X +SQL ORC MR 1889 1921 46 8.3 120.1 13.0X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Parquet Reader Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 290 294 8 54.3 18.4 1.0X +ParquetReader Vectorized -> Row 252 256 5 62.4 16.0 1.2X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +SQL CSV 26742 26743 1 0.6 1700.2 1.0X +SQL Json 10855 10855 0 1.4 690.1 2.5X +SQL Parquet Vectorized 195 202 7 80.8 12.4 137.3X +SQL Parquet MR 2805 2806 0 5.6 178.4 9.5X +SQL ORC Vectorized 376 383 5 41.8 23.9 71.1X +SQL ORC MR 2021 2092 102 7.8 128.5 13.2X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Parquet Reader Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 248 253 5 63.4 15.8 1.0X +ParquetReader Vectorized -> Row 249 251 2 63.1 15.9 1.0X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +SQL CSV 34841 34855 20 0.5 2215.1 1.0X +SQL Json 14121 14133 18 1.1 897.8 2.5X +SQL Parquet Vectorized 288 303 17 54.7 18.3 121.2X +SQL Parquet MR 3178 3197 27 4.9 202.0 11.0X +SQL ORC Vectorized 465 476 8 33.8 29.6 74.9X +SQL ORC MR 2255 2260 6 7.0 143.4 15.4X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Parquet Reader Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 344 354 11 45.8 21.8 1.0X +ParquetReader Vectorized -> Row 383 385 3 41.1 24.3 0.9X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +SQL CSV 29336 29563 322 0.5 1865.1 1.0X +SQL Json 13452 13544 130 1.2 855.3 2.2X +SQL Parquet Vectorized 186 200 22 84.8 11.8 158.1X +SQL Parquet MR 2752 2815 90 5.7 175.0 10.7X +SQL ORC Vectorized 460 465 6 34.2 29.3 63.7X +SQL ORC MR 2054 2072 26 7.7 130.6 14.3X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Parquet Reader Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 244 246 4 64.6 15.5 1.0X +ParquetReader Vectorized -> Row 247 250 4 63.7 15.7 1.0X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +SQL CSV 37812 37897 120 0.4 2404.0 1.0X +SQL Json 19499 19509 15 0.8 1239.7 1.9X +SQL Parquet Vectorized 284 292 10 55.4 18.1 133.2X +SQL Parquet MR 3236 3248 17 4.9 205.7 11.7X +SQL ORC Vectorized 542 558 18 29.0 34.4 69.8X +SQL ORC MR 2273 2298 36 6.9 144.5 16.6X + +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Parquet Reader Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +ParquetReader Vectorized 342 352 13 46.0 21.7 1.0X +ParquetReader Vectorized -> Row 341 344 3 46.1 21.7 1.0X ================================================================================================ Int and String Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -SQL CSV 21130 / 21246 0.5 2015.1 1.0X -SQL Json 12145 / 12174 0.9 1158.2 1.7X -SQL Parquet Vectorized 2363 / 2377 4.4 225.3 8.9X -SQL Parquet MR 4555 / 4557 2.3 434.4 4.6X -SQL ORC Vectorized 2361 / 2388 4.4 225.1 9.0X -SQL ORC MR 4186 / 4209 2.5 399.2 5.0X +Int and String Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +SQL CSV 26777 26806 41 0.4 2553.7 1.0X +SQL Json 13894 14071 251 0.8 1325.0 1.9X +SQL Parquet Vectorized 2351 2404 75 4.5 224.2 11.4X +SQL Parquet MR 5198 5219 29 2.0 495.8 5.2X +SQL ORC Vectorized 2434 2435 1 4.3 232.1 11.0X +SQL ORC MR 4281 4345 91 2.4 408.3 6.3X ================================================================================================ Repeated String Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Repeated String: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -SQL CSV 11693 / 11729 0.9 1115.1 1.0X -SQL Json 7025 / 7025 1.5 669.9 1.7X -SQL Parquet Vectorized 803 / 821 13.1 76.6 14.6X -SQL Parquet MR 1776 / 1790 5.9 169.4 6.6X -SQL ORC Vectorized 491 / 494 21.4 46.8 23.8X -SQL ORC MR 2050 / 2063 5.1 195.5 5.7X +Repeated String: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +SQL CSV 15779 16507 1029 0.7 1504.8 1.0X +SQL Json 7866 7877 14 1.3 750.2 2.0X +SQL Parquet Vectorized 820 826 5 12.8 78.2 19.2X +SQL Parquet MR 2646 2658 17 4.0 252.4 6.0X +SQL ORC Vectorized 638 644 7 16.4 60.9 24.7X +SQL ORC MR 2205 2222 25 4.8 210.3 7.2X ================================================================================================ Partitioned Table Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -Partitioned Table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------- -Data column - CSV 30965 / 31041 0.5 1968.7 1.0X -Data column - Json 12876 / 12882 1.2 818.6 2.4X -Data column - Parquet Vectorized 277 / 282 56.7 17.6 111.6X -Data column - Parquet MR 3398 / 3402 4.6 216.0 9.1X -Data column - ORC Vectorized 399 / 407 39.4 25.4 77.5X -Data column - ORC MR 2583 / 2589 6.1 164.2 12.0X -Partition column - CSV 7403 / 7427 2.1 470.7 4.2X -Partition column - Json 5587 / 5625 2.8 355.2 5.5X -Partition column - Parquet Vectorized 71 / 78 222.6 4.5 438.3X -Partition column - Parquet MR 1798 / 1808 8.7 114.3 17.2X -Partition column - ORC Vectorized 72 / 75 219.0 4.6 431.2X -Partition column - ORC MR 1772 / 1778 8.9 112.6 17.5X -Both columns - CSV 30211 / 30212 0.5 1920.7 1.0X -Both columns - Json 13382 / 13391 1.2 850.8 2.3X -Both columns - Parquet Vectorized 321 / 333 49.0 20.4 96.4X -Both columns - Parquet MR 3656 / 3661 4.3 232.4 8.5X -Both columns - ORC Vectorized 443 / 448 35.5 28.2 69.9X -Both columns - ORC MR 2626 / 2633 6.0 167.0 11.8X +OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Partitioned Table: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Data column - CSV 38142 38183 58 0.4 2425.0 1.0X +Data column - Json 14664 14667 4 1.1 932.3 2.6X +Data column - Parquet Vectorized 304 318 13 51.8 19.3 125.7X +Data column - Parquet MR 3378 3384 8 4.7 214.8 11.3X +Data column - ORC Vectorized 475 481 7 33.1 30.2 80.3X +Data column - ORC MR 2324 2356 46 6.8 147.7 16.4X +Partition column - CSV 14680 14742 88 1.1 933.3 2.6X Review comment: CSV and JSON below is 2 times slower now. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org