[GitHub] [spark] wangyum opened a new pull request #26589: [SPARK-29947][SQL] Improve ResolveTables and ResolveRelations performance

GitBox Mon, 18 Nov 2019 19:11:31 -0800

wangyum opened a new pull request #26589: [SPARK-29947][SQL] Improve 
ResolveTables and ResolveRelations performance
URL: https://github.com/apache/spark/pull/26589
 
 
   ### What changes were proposed in this pull request?
   
   It is very common for a SQL query to query a table more than twice.
   This PR try to improve `ResolveTables` and `ResolveRelations` performance by 
reducing the connection times to Hive Metastore Server in such case.
   
   
   ### Why are the changes needed?
   1. Reduce the connection times to Hive Metastore Server.
   2. Improve `ResolveTables` and `ResolveRelations` performance.
   
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   
   manual test.
   After [SPARK-29606](https://issues.apache.org/jira/browse/SPARK-29606) and 
before this PR:
   ```
   === Metrics of Analyzer/Optimizer Rules ===
   Total number of runs: 9323
   Total time: 2.687441263 seconds
   
   Rule                                                                         
                      Effective Time / Total Time                     Effective 
Runs / Total Runs
   
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations             
                      929173767 / 930133504                           2 / 18
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables                
                      0 / 383363402                                   0 / 18
   org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin                   
                      0 / 99433540                                    0 / 4
   org.apache.spark.sql.catalyst.analysis.DecimalPrecision                      
                      41809394 / 83727901                             2 / 18
   org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions         
                      71372977 / 71372977                             1 / 1
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts        
                      0 / 59071933                                    0 / 18
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences            
                      37858325 / 58471776                             5 / 18
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings           
                      20889892 / 53229016                             1 / 18
   
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion  
                   23428968 / 50890815                             1 / 18
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion             
                      23230666 / 49182607                             1 / 18
   org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator             
                      0 / 43638350                                    0 / 18
   org.apache.spark.sql.catalyst.optimizer.ColumnPruning                        
                      17194844 / 42530885                             1 / 6
   ```
   After [SPARK-29606](https://issues.apache.org/jira/browse/SPARK-29606) and 
after this PR:
   ```
   === Metrics of Analyzer/Optimizer Rules ===
   Total number of runs: 9323
   Total time: 2.163765869 seconds
   
   Rule                                                                         
                      Effective Time / Total Time                     Effective 
Runs / Total Runs
   
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations             
                      658905353 / 659829383                           2 / 18
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables                
                      0 / 220708715                                   0 / 18
   org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin                   
                      0 / 99606816                                    0 / 4
   org.apache.spark.sql.catalyst.analysis.DecimalPrecision                      
                      39616060 / 78215752                             2 / 18
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences            
                      36706549 / 54917789                             5 / 18
   org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions         
                      53561921 / 53561921                             1 / 1
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts        
                      0 / 52329678                                    0 / 18
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings           
                      20945755 / 49695998                             1 / 18
   
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion  
                   20872241 / 46740145                             1 / 18
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion             
                      19780298 / 44327227                             1 / 18
   org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator             
                      0 / 42312023                                    0 / 18
   org.apache.spark.sql.catalyst.optimizer.ColumnPruning                        
                      17197393 / 39501424                             1 / 6
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum opened a new pull request #26589: [SPARK-29947][SQL] Improve ResolveTables and ResolveRelations performance

Reply via email to