Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/14044
@dongjoon-hyun my point is that analysis should not be taking 12 seconds at
all. You can see how much time is spent in a rule, if you add the following
lines of code to your example:
```scala
import org.apache.spark.sql.catalyst.rules.RuleExecutor
println(RuleExecutor.dumpTimeSpent)
```
This yields the following result (timing in ns):
```
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences
18784486408
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions
505619796
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PropagateTypes
195027905
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability
118882430
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences
74401505
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics
40068476
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer
32929965
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator
30524660
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts
30453770
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions
28383135
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame
26168955
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder
25736499
org.apache.spark.sql.catalyst.analysis.TimeWindowing
24807670
org.apache.spark.sql.catalyst.analysis.DecimalPrecision
24000260
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery
21653219
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion
20830229
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings
19183636
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion
17849664
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality
15186886
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion
13994296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division
13929023
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations
13468710
org.apache.spark.sql.catalyst.analysis.CleanupAliases
13210810
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StringToIntegralCasts
13191046
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic
11310837
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF
10712897
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion
10589030
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases
7172334
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions
5994564
org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution
5914136
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy
5303578
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin
4060244
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot
3174805
org.apache.spark.sql.catalyst.analysis.EliminateUnions
2787433
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate
2731683
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
2624228
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates
2417768
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution
2368503
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion
2126155
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance
2059795
org.apache.spark.sql.execution.datasources.DataSourceAnalysis
1944978
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast
1912039
org.apache.spark.sql.execution.datasources.ResolveDataSource
1896232
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes
1623414
org.apache.spark.sql.execution.datasources.FindDataSourceTable
1623004
```
I think we should take a look at `ResolveReferences`. I do think your PR
has merit; we really shouldn't be analyzing the same plan twice.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]