wangyum opened a new pull request #26257: [WIP][SPARK-29606][SQL] Improve 
EliminateOuterJoin performance
URL: https://github.com/apache/spark/pull/26257
 
 
   ### What changes were proposed in this pull request?
   
   This PR try to improve `EliminateOuterJoin` performance. 
[Here](https://issues.apache.org/jira/browse/SPARK-29606) is the reproduce 
code. 
   
   **Before this PR**:
   ```
   === Metrics of Analyzer/Optimizer Rules ===
   Total number of runs: 1521
   Total time: 12.194145934 seconds
   
   Rule                                                                         
                      Effective Time / Total Time                     Effective 
Runs / Total Runs                    
   
   org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin                   
                      0 / 8647734491                                  0 / 4     
                                     
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations             
                      1514525263 / 1515162281                         2 / 18    
                                     
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables                
                      0 / 882415286                                   0 / 18    
                                     
   org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions         
                      163428048 / 163428048                           1 / 1     
                                     
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences            
                      82775616 / 95273083                             5 / 18    
                                     
   org.apache.spark.sql.catalyst.analysis.DecimalPrecision                      
                      30104935 / 62834058                             2 / 18    
                                     
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts        
                      0 / 42241328                                    0 / 18    
                                     
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings           
                      14374985 / 41282288                             1 / 18    
                                     
   
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion  
                   17304314 / 39714534                             1 / 18       
                                  
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion             
                      15565635 / 36120144                             1 / 18    
                                     
   org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator             
                      0 / 35293365                                    0 / 18    
                                     
   ...
   ```
   
   **After this PR**:
   ```
   === Metrics of Analyzer/Optimizer Rules ===
   Total number of runs: 1521
   Total time: 4.226716432 seconds
   
   Rule                                                                         
                      Effective Time / Total Time                     Effective 
Runs / Total Runs                    
   
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations             
                      1738065739 / 1738729067                         2 / 18    
                                     
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables                
                      0 / 998958458                                   0 / 18    
                                     
   org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions         
                      179616482 / 179616482                           1 / 1     
                                     
   org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin                   
                      0 / 154345529                                   0 / 4     
                                     
   org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences            
                      90911471 / 103380392                            5 / 18    
                                     
   org.apache.spark.sql.catalyst.analysis.DecimalPrecision                      
                      29755452 / 69169476                             2 / 18    
                                     
   
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion  
                   20563970 / 45868054                             1 / 18       
                                  
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings           
                      16797535 / 45569505                             1 / 18    
                                     
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts        
                      0 / 44890950                                    0 / 18    
                                     
   org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion             
                      17563055 / 42319451                             1 / 18    
                                     
   ...
   ```
   
   ### Why are the changes needed?
   Improve `EliminateOuterJoin` performance.
   
   
   ### Does this PR introduce any user-facing change?
   
   
   
   ### How was this patch tested?
   
   Manual test.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to