dongjoon-hyun commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition URL: https://github.com/apache/spark/pull/23964#discussion_r266103957
########## File path: sql/core/benchmarks/OrcV2NestedSchemaPruningBenchmark-results.txt ########## @@ -6,35 +6,42 @@ OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Selection: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Top-level column 120 148 24 8.3 120.0 1.0X -Nested column 2367 2415 43 0.4 2367.0 0.1X +Top-level column 135 169 19 7.4 134.7 1.0X +Nested column 2131 2216 95 0.5 2131.4 0.1X OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Limiting: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Top-level column 129 153 16 7.8 128.5 1.0X -Nested column 2368 2400 32 0.4 2367.7 0.1X +Top-level column 147 158 10 6.8 146.9 1.0X +Nested column 2149 2204 50 0.5 2148.9 0.1X OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Repartitioning: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Top-level column 359 396 59 2.8 358.9 1.0X -Nested column 4100 4147 59 0.2 4099.9 0.1X +Top-level column 386 399 16 2.6 385.8 1.0X +Nested column 2612 2666 57 0.4 2612.2 0.1X Review comment: Since this PR is for all data sources, this `ORC v2` also become faster; `4147s` -> `2666s`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
