viirya opened a new pull request #23943: [SPARK-27034][SQL] Nested schema pruning for ORC URL: https://github.com/apache/spark/pull/23943 ## What changes were proposed in this pull request? We only supported nested schema pruning for Parquet previously. This proposes to support nested schema pruning for ORC too. Note: This only covers ORC v1. We can deal with ORC v2 as a TODO item. ## Benchmark Ran benchmark with `OrcNestedSchemaPruningBenchmark`. Before: ```scala [info] Running benchmark: Selection [info] Running case: Top-level column [info] Stopped after 27 iterations, 2054 ms [info] Running case: Nested column [info] Stopped after 10 iterations, 14384 ms [info] [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3 [info] Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz [info] Selection: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------ [info] Top-level column 64 / 76 15.6 63.9 1.0X [info] Nested column 1300 / 1438 0.8 1299.7 0.0X ``` After: ```scala [info] Running benchmark: Selection [info] Running case: Top-level column [info] Stopped after 24 iterations, 2051 ms [info] Running case: Nested column [info] Stopped after 10 iterations, 5005 ms [info] [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3 [info] Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz [info] Selection: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative [info] ------------------------------------------------------------------------------------------------ [info] Top-level column 71 / 85 14.2 70.6 1.0X [info] Nested column 480 / 501 2.1 479.5 0.1X ``` ## How was this patch tested? Added tests.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
