dongjoon-hyun commented on a change in pull request #23964: [SPARK-26975][SQL] 
Support nested-column pruning over limit/sample/repartition
URL: https://github.com/apache/spark/pull/23964#discussion_r266104512
 
 

 ##########
 File path: sql/core/benchmarks/ParquetNestedSchemaPruningBenchmark-results.txt
 ##########
 @@ -6,35 +6,42 @@ OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 
3.10.0-862.3.2.el7.x86_64
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Selection:                                Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Top-level column                                    145            174         
 23          6.9         145.1       1.0X
-Nested column                                       325            346         
 19          3.1         324.8       0.4X
+Top-level column                                    128            166         
 24          7.8         128.0       1.0X
+Nested column                                       308            325         
 10          3.2         308.3       0.4X
 
 OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Limiting:                                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Top-level column                                    434            508         
108          2.3         434.3       1.0X
-Nested column                                       625            647         
 23          1.6         624.8       0.7X
+Top-level column                                    447            496         
 91          2.2         447.0       1.0X
+Nested column                                       631            666         
 40          1.6         631.2       0.7X
 
 OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Repartitioning:                           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Top-level column                                    357            368         
  9          2.8         356.9       1.0X
-Nested column                                      2897           2976         
 88          0.3        2897.4       0.1X
+Top-level column                                    360            394         
 84          2.8         360.0       1.0X
+Nested column                                       553            586         
 65          1.8         553.5       0.7X
 
 OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Repartitioning by exprs:                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-Top-level column                                    365            413         
 77          2.7         364.9       1.0X
-Nested column                                      2902           2969         
 99          0.3        2902.4       0.1X
+Top-level column                                    368            393         
 50          2.7         368.3       1.0X
+Nested column                                      2942           3017         
 82          0.3        2942.0       0.1X
+
+OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Sample:                                   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    124            143         
 10          8.1         124.1       1.0X
+Nested column                                       345            366         
 34          2.9         344.8       0.4X
 
 Review comment:
   I added a new benchmark case for `Sample`. This is the result after 
improvement.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to