dongjoon-hyun commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition URL: https://github.com/apache/spark/pull/23964#discussion_r266104232
########## File path: sql/core/benchmarks/ParquetNestedSchemaPruningBenchmark-results.txt ########## @@ -6,35 +6,42 @@ OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Selection: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Top-level column 145 174 23 6.9 145.1 1.0X -Nested column 325 346 19 3.1 324.8 0.4X +Top-level column 128 166 24 7.8 128.0 1.0X +Nested column 308 325 10 3.2 308.3 0.4X OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Limiting: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Top-level column 434 508 108 2.3 434.3 1.0X -Nested column 625 647 23 1.6 624.8 0.7X +Top-level column 447 496 91 2.2 447.0 1.0X +Nested column 631 666 40 1.6 631.2 0.7X OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Repartitioning: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Top-level column 357 368 9 2.8 356.9 1.0X -Nested column 2897 2976 88 0.3 2897.4 0.1X +Top-level column 360 394 84 2.8 360.0 1.0X +Nested column 553 586 65 1.8 553.5 0.7X Review comment: This become faster; `2976s` -> `586s`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
