Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/14044
  
    @dongjoon-hyun my point is that analysis should not be taking 12 seconds at 
all. You can see how much time is spent in a rule, if you add the following 
lines of code to your example:
    ```scala
    import org.apache.spark.sql.catalyst.rules.RuleExecutor
    println(RuleExecutor.dumpTimeSpent)
    ```
    This yields the following result (timing in ns):
    ```
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences           
      18784486408
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions   
      505619796
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$PropagateTypes          
      195027905
    org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability              
      118882430
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences    
      74401505
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics    
      40068476
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer         
      32929965
    org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator            
      30524660
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts       
      30453770
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions            
      28383135
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame          
      26168955
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder          
      25736499
    org.apache.spark.sql.catalyst.analysis.TimeWindowing                        
      24807670
    org.apache.spark.sql.catalyst.analysis.DecimalPrecision                     
      24000260
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery             
      21653219
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion            
      20830229
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings          
      19183636
    
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion  
  17849664
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality         
      15186886
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion              
      13994296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division                
      13929023
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations      
      13468710
    org.apache.spark.sql.catalyst.analysis.CleanupAliases                       
      13210810
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$StringToIntegralCasts   
      13191046
    org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic     
      11310837
    org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF      
      10712897
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion        
      10589030
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases              
      7172334
    org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions    
      5994564
    org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution             
      5914136
    
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy
 5303578
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin  
      4060244
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot                
      3174805
    org.apache.spark.sql.catalyst.analysis.EliminateUnions                      
      2787433
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate             
      2731683
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations            
      2624228
    org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates            
      2417768
    org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution         
      2368503
    org.apache.spark.sql.execution.datasources.PreprocessTableInsertion         
      2126155
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance          
      2059795
    org.apache.spark.sql.execution.datasources.DataSourceAnalysis               
      1944978
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast               
      1912039
    org.apache.spark.sql.execution.datasources.ResolveDataSource                
      1896232
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes  
      1623414
    org.apache.spark.sql.execution.datasources.FindDataSourceTable              
      1623004
    ```
    I think we should take a look at `ResolveReferences`. I do think your PR 
has merit; we really shouldn't be analyzing the same plan twice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to