dongjoon-hyun commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition URL: https://github.com/apache/spark/pull/23964#discussion_r266104512
########## File path: sql/core/benchmarks/ParquetNestedSchemaPruningBenchmark-results.txt ########## @@ -6,35 +6,42 @@ OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Selection: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Top-level column 145 174 23 6.9 145.1 1.0X -Nested column 325 346 19 3.1 324.8 0.4X +Top-level column 128 166 24 7.8 128.0 1.0X +Nested column 308 325 10 3.2 308.3 0.4X OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Limiting: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Top-level column 434 508 108 2.3 434.3 1.0X -Nested column 625 647 23 1.6 624.8 0.7X +Top-level column 447 496 91 2.2 447.0 1.0X +Nested column 631 666 40 1.6 631.2 0.7X OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Repartitioning: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Top-level column 357 368 9 2.8 356.9 1.0X -Nested column 2897 2976 88 0.3 2897.4 0.1X +Top-level column 360 394 84 2.8 360.0 1.0X +Nested column 553 586 65 1.8 553.5 0.7X OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Repartitioning by exprs: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Top-level column 365 413 77 2.7 364.9 1.0X -Nested column 2902 2969 99 0.3 2902.4 0.1X +Top-level column 368 393 50 2.7 368.3 1.0X +Nested column 2942 3017 82 0.3 2942.0 0.1X + +OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Sample: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +Top-level column 124 143 10 8.1 124.1 1.0X +Nested column 345 366 34 2.9 344.8 0.4X Review comment: I added a new benchmark case for `Sample`. This is the result after improvement. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
