dongjoon-hyun commented on a change in pull request #23964: [SPARK-26975][SQL] 
Support nested-column pruning over limit/sample/repartition
URL: https://github.com/apache/spark/pull/23964#discussion_r266103957
 
 

 ##########
 File path: sql/core/benchmarks/OrcV2NestedSchemaPruningBenchmark-results.txt
 ##########
 @@ -6,35 +6,42 @@ OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 
3.10.0-862.3.2.el7.x86_64
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Selection:                                Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Top-level column                                    120            148         
 24          8.3         120.0       1.0X
-Nested column                                      2367           2415         
 43          0.4        2367.0       0.1X
+Top-level column                                    135            169         
 19          7.4         134.7       1.0X
+Nested column                                      2131           2216         
 95          0.5        2131.4       0.1X
 
 OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Limiting:                                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Top-level column                                    129            153         
 16          7.8         128.5       1.0X
-Nested column                                      2368           2400         
 32          0.4        2367.7       0.1X
+Top-level column                                    147            158         
 10          6.8         146.9       1.0X
+Nested column                                      2149           2204         
 50          0.5        2148.9       0.1X
 
 OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Repartitioning:                           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Top-level column                                    359            396         
 59          2.8         358.9       1.0X
-Nested column                                      4100           4147         
 59          0.2        4099.9       0.1X
+Top-level column                                    386            399         
 16          2.6         385.8       1.0X
+Nested column                                      2612           2666         
 57          0.4        2612.2       0.1X
 
 Review comment:
   Since this PR is for all data sources, this `ORC v2` also become faster; 
`4147s` -> `2666s`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to