pan3793 opened a new pull request #35431:
URL: https://github.com/apache/spark/pull/35431
### What changes were proposed in this pull request?
This is a backport PR of #33664 for branch-3.1.
### Why are the changes needed?
We found a query in production that cost lots of time in optimize phase when
enable DPP, the SQL pattern like
```
select <cols...>
from a
left join b on a.<col> = b.<col>
left join c on b.<col> = c.<col>
left join d on c.<col> = d.<col>
left join e on d.<col> = e.<col>
left join f on e.<col> = f.<col>
left join g on f.<col> = g.<col>
left join h on g.<col> = h.<col>
...
```
<details>
<summary>Before this PR, Analyzer/Optimizer costs 15821.6 seconds</summary>
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 24763671
Total time: 15821.588159083 seconds
Rule
Effective Time / Total Time Effective
Runs / Total Runs
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries
6325784430499 / 14213630904736 2047 /
342802
org.apache.spark.sql.catalyst.optimizer.ColumnPruning
28009746 / 267164777817 1 / 691736
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability
0 / 118329246545 0 / 342804
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases
8997600 / 71070686823 1 / 348934
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
923149600 / 59903945393 2036 /
348934
org.apache.spark.sql.catalyst.optimizer.NullPropagation
12343892 / 52895299683 1 / 348934
org.apache.spark.sql.execution.datasources.SchemaPruning
0 / 51638842573 0 / 171401
org.apache.spark.sql.catalyst.optimizer.UnwrapCastInBinaryComparison
0 / 42686293006 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals
0 / 41190229736 0 / 348934
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints
1482075440 / 40899690529 2048 /
171401
org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison
0 / 40747831162 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions
0 / 39946334062 0 / 348934
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation
655084587 / 39426073462 2047 /
348934
org.apache.spark.sql.catalyst.optimizer.PruneFilters
489780 / 34731670364 1 / 520335
org.apache.spark.sql.catalyst.optimizer.OptimizeJsonExprs
0 / 33533545307 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReplaceNullWithFalseInPredicate
0 / 31910754610 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ConstantFolding
22871375 / 31308048659 1 / 348934
org.apache.spark.sql.catalyst.optimizer.LikeSimplification
0 / 31276042884 0 / 348934
org.apache.spark.sql.catalyst.optimizer.OptimizeWindowFunctions
0 / 31100888868 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator
0 / 30855866688 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyCasts
0 / 30414758547 0 / 348934
org.apache.spark.sql.catalyst.optimizer.OptimizeIn
0 / 30211101358 0 / 348934
org.apache.spark.sql.catalyst.optimizer.OptimizeUpdateFields
0 / 29986491532 0 / 348936
org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions
0 / 29740191900 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyExtractValueOps
0 / 25399289000 0 / 348934
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery
0 / 21381558331 0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushDownPredicates
5831922974 / 18327095753 3072 /
691737
org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions
5007709 / 17431480903 1 / 171401
org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabaseAndCatalog
0 / 15321596322 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime
0 / 15060720651 0 / 171401
org.apache.spark.sql.catalyst.optimizer.DecimalAggregates
0 / 14874103589 0 / 171401
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions
50420951 / 14798400219 1 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceUpdateFieldsExpression
0 / 14791439186 0 / 171401
org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects
0 / 14361213979 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReassignLambdaVariableID
0 / 14258459712 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RewriteNonCorrelatedExists
0 / 14094898098 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RemoveNoopOperators
1975195 / 12967136767 2 / 691736
org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates
0 / 12573667353 0 / 171401
org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown
0 / 11355809540 0 / 171401
org.apache.spark.sql.execution.dynamicpruning.CleanupDynamicPruningFilters
350911193 / 9931582633 2048 /
171401
org.apache.spark.sql.catalyst.optimizer.ConstantPropagation
0 / 8268511133 0 / 348934
org.apache.spark.sql.catalyst.optimizer.CollapseProject
10135014 / 6764166524 1 / 520335
org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint
0 / 6244880486 0 / 171401
org.apache.spark.sql.catalyst.optimizer.CombineUnions
0 / 5811583863 0 / 520335
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates
28926471 / 5423624109 1 / 171401
org.apache.spark.sql.catalyst.optimizer.CollapseRepartition
0 / 5245332974 0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateSorts
0 / 5051398253 0 / 171401
org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation
0 / 4975057807 0 / 342802
org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers
0 / 4804919393 0 / 171401
org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate
0 / 4408441952 0 / 171401
org.apache.spark.sql.catalyst.optimizer.CollapseWindow
0 / 4349408795 0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateSerialization
0 / 4170274724 0 / 348934
org.apache.spark.sql.catalyst.optimizer.TransposeWindow
0 / 4113327220 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReorderJoin
0 / 4107443950 0 / 348934
org.apache.spark.sql.catalyst.optimizer.CombineFilters
0 / 4061601886 0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion
0 / 4042615936 0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin
0 / 4024293929 0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateLimits
0 / 3908219408 0 / 348934
org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery
135506567 / 3895045754 2047 /
171401
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin
0 / 3875669986 0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushLeftSemiLeftAntiThroughJoin
0 / 3875663441 0 / 348934
org.apache.spark.sql.catalyst.optimizer.LimitPushDown
0 / 3859481346 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate
0 / 3404517096 0 / 171401
org.apache.spark.sql.catalyst.optimizer.PushExtraPredicateThroughJoin
0 / 2876802099 0 / 171401
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughNonJoin
0 / 2546037832 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ExtractPythonUDFFromJoinCondition
0 / 2392444200 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions
0 / 2297492778 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RewriteExceptAll
0 / 2239751163 0 / 171401
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromGenerate
0 / 2189398940 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions
0 / 2187645769 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter
0 / 2135169574 0 / 171401
org.apache.spark.sql.execution.python.ExtractGroupingPythonUDFFromAggregate
0 / 2058019085 0 / 171401
org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
602948 / 2030110600 1 / 171401
org.apache.spark.sql.catalyst.optimizer.OptimizeLimitZero
0 / 2028208284 0 / 171401
org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters
0 / 1985665817 0 / 171401
org.apache.spark.sql.catalyst.analysis.EliminateView
0 / 1965869470 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning
0 / 1892132711 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RewriteIntersectAll
0 / 1891839429 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin
0 / 1873028976 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate
0 / 1872377392 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin
0 / 1860128741 0 / 171401
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts
0 / 405470117 0 / 342802
org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery
0 / 242854215 0 / 171401
org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder
0 / 226833963 0 / 171401
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences
117151873 / 123533385 7 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion
102066025 / 121689146 5 / 13
org.apache.spark.sql.catalyst.optimizer.CombineConcats
0 / 112256648 0 / 348934
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts
76183548 / 108546928 4 / 13
org.apache.spark.sql.catalyst.optimizer.EliminateAggregateFilter
0 / 107905167 0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateDistinct
0 / 103804760 0 / 171401
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions
85607389 / 94162213 8 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division
76446295 / 93129469 5 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings
53450988 / 87954605 1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator
0 / 85268434 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion
50348094 / 80404327 4 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion
39786113 / 59089899 3 / 13
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone
56110133 / 57619849 10 / 13
org.apache.spark.sql.execution.dynamicpruning.PartitionPruning
22888739 / 54329716 1 / 171401
org.apache.spark.sql.catalyst.analysis.DecimalPrecision
0 / 49832160 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion
0 / 46158859 0 / 13
org.apache.spark.sql.execution.datasources.FindDataSourceTable
43323608 / 44575170 1 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion
14900959 / 39466022 1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveBinaryArithmetic
0 / 38815701 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality
0 / 36470552 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations
7216638 / 35530574 1 / 13
org.apache.spark.sql.execution.python.ExtractPythonUDFs
0 / 34271447 0 / 171401
org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables
0 / 32166544 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IntegralDivision
0 / 30547791 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveExpressionsWithNamePlaceholders
0 / 30506602 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StringLiteralCoercion
0 / 30116261 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveHigherOrderFunctions
0 / 27859446 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion
0 / 24218064 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion
0 / 23413444 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed
0 / 22036715 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion
0 / 21957835 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder
0 / 21463552 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$MapZipWithCoercion
0 / 20489026 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame
0 / 20481142 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates
0 / 19715779 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences
0 / 19693693 0 / 13
org.apache.spark.sql.catalyst.analysis.TimeWindowing
0 / 19530333 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
16526728 / 18814733 1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery
0 / 16649474 0 / 13
org.apache.spark.sql.execution.aggregate.ResolveEncodersInScalaAgg
0 / 15275758 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions
0 / 11329936 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer
4281027 / 10765657 1 / 13
org.apache.spark.sql.catalyst.analysis.CTESubstitution
0 / 9577794 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases
0 / 9217208 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns
0 / 8289279 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions
0 / 8192539 0 / 13
org.apache.spark.sql.catalyst.analysis.ApplyCharTypePadding
0 / 7991101 0 / 2
org.apache.spark.sql.catalyst.analysis.CleanupAliases
5307130 / 7435920 1 / 3
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes
0 / 6506053 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast
0 / 6446727 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance
0 / 6075593 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog
0 / 4092936 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy
0 / 3430531 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables
0 / 2966446 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics
0 / 2864876 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF
0 / 2818668 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy
0 / 2816430 0 / 13
org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin
0 / 2491546 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveUnion
0 / 2477483 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation
0 / 2463866 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF
0 / 2376711 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate
0 / 2133804 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec
0 / 1730246 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases
0 / 1712259 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions
0 / 1698918 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot
0 / 1618569 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin
0 / 1589833 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveCatalogs
0 / 1560151 0 / 13
org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions
0 / 1500410 0 / 1049
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables
0 / 1463204 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUserSpecifiedColumns
0 / 1355593 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNamespace
0 / 1339813 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions
0 / 1338443 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveInsertInto
0 / 1323062 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic
0 / 1318399 0 / 2
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile
0 / 1302134 0 / 13
org.apache.spark.sql.execution.datasources.FallBackFileSourceV2
0 / 1074454 0 / 13
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals
0 / 698629 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveJoinStrategyHints
0 / 452536 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints
0 / 306642 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution
0 / 300756 0 / 2
org.apache.spark.sql.execution.datasources.PreprocessTableCreation
0 / 276685 0 / 2
org.apache.spark.sql.catalyst.analysis.EliminateUnions
0 / 214857 0 / 2
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences
0 / 203514 0 / 2
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion
0 / 164163 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveNoopDropTable
0 / 108345 0 / 2
org.apache.spark.sql.execution.datasources.DataSourceAnalysis
0 / 90574 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAlterTableChanges
0 / 85175 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints
0 / 82212 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$DisableHints
0 / 7780 0 / 2
```
</details>
<details>
<summary>After this PR, Analyzer/Optimizer costs 2.4 seconds
seconds</summary>
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 2140
Total time: 2.407325128 seconds
Rule
Effective Time / Total Time Effective
Runs / Total Runs
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts
79087019 / 116017648 4 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion
88569423 / 112854377 5 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences
89163583 / 95807127 7 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division
73197512 / 91715660 5 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions
73390555 / 81845702 8 / 13
org.apache.spark.sql.catalyst.optimizer.ColumnPruning
24099474 / 80271976 1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator
0 / 79928019 0 / 13
org.apache.spark.sql.catalyst.optimizer.PushDownPredicates
73617831 / 79035882 2 / 7
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion
52077624 / 78992522 4 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings
39724587 / 73183778 1 / 13
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone
52807613 / 54633834 10 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations
21340706 / 53040407 1 / 13
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions
51595369 / 51595369 1 / 1
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion
32842401 / 49095324 3 / 13
org.apache.spark.sql.catalyst.analysis.DecimalPrecision
0 / 48574229 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion
0 / 43384882 0 / 13
org.apache.spark.sql.execution.datasources.FindDataSourceTable
40348276 / 41709977 1 / 13
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints
40235278 / 40235278 1 / 1
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion
13542681 / 38418852 1 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality
0 / 35130781 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IntegralDivision
0 / 35129152 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StringLiteralCoercion
0 / 32116493 0 / 13
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates
32065069 / 32065069 1 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveBinaryArithmetic
0 / 30166678 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables
0 / 30082254 0 / 13
org.apache.spark.sql.catalyst.optimizer.ConstantFolding
23910538 / 29049259 1 / 4
org.apache.spark.sql.catalyst.analysis.ResolveExpressionsWithNamePlaceholders
0 / 26220500 0 / 13
org.apache.spark.sql.execution.dynamicpruning.PartitionPruning
25857276 / 25857276 1 / 1
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases
11800935 / 25453815 1 / 4
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion
0 / 24882479 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveHigherOrderFunctions
0 / 24773800 0 / 13
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
0 / 23687102 0 / 4
org.apache.spark.sql.catalyst.optimizer.OptimizeUpdateFields
0 / 22496833 0 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability
0 / 22086610 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder
0 / 21776763 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion
0 / 21668319 0 / 13
org.apache.spark.sql.execution.datasources.SchemaPruning
0 / 21119650 0 / 1
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion
0 / 20995370 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$MapZipWithCoercion
0 / 20852224 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame
0 / 20482582 0 / 13
org.apache.spark.sql.catalyst.optimizer.NullPropagation
8449073 / 20310580 1 / 4
org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals
0 / 19685107 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences
0 / 18005681 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed
0 / 17873443 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates
0 / 16909722 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
14231820 / 16732791 1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery
0 / 16280559 0 / 13
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation
0 / 16121492 0 / 4
org.apache.spark.sql.catalyst.analysis.TimeWindowing
0 / 15969419 0 / 13
org.apache.spark.sql.execution.aggregate.ResolveEncodersInScalaAgg
0 / 15441100 0 / 13
org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison
0 / 14334553 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer
5534984 / 12617237 1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions
0 / 10014043 0 / 2
org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions
0 / 9761592 0 / 4
org.apache.spark.sql.catalyst.analysis.CTESubstitution
0 / 9747538 0 / 2
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery
0 / 9694808 0 / 4
org.apache.spark.sql.catalyst.optimizer.UnwrapCastInBinaryComparison
0 / 9665154 0 / 4
org.apache.spark.sql.catalyst.optimizer.OptimizeJsonExprs
0 / 9430804 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns
0 / 9322119 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases
0 / 9113696 0 / 13
org.apache.spark.sql.catalyst.optimizer.SimplifyExtractValueOps
0 / 8938695 0 / 4
org.apache.spark.sql.catalyst.optimizer.OptimizeIn
0 / 8921298 0 / 4
org.apache.spark.sql.catalyst.optimizer.ReplaceNullWithFalseInPredicate
0 / 8695066 0 / 4
org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions
0 / 8438285 0 / 4
org.apache.spark.sql.catalyst.analysis.ApplyCharTypePadding
0 / 8406029 0 / 2
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator
0 / 8196064 0 / 4
org.apache.spark.sql.catalyst.optimizer.CollapseProject
6878987 / 8000632 1 / 5
org.apache.spark.sql.catalyst.optimizer.SimplifyCasts
0 / 7996708 0 / 4
org.apache.spark.sql.catalyst.optimizer.LikeSimplification
0 / 7983621 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions
0 / 7948113 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast
0 / 7930266 0 / 13
org.apache.spark.sql.catalyst.optimizer.OptimizeWindowFunctions
0 / 7569548 0 / 4
org.apache.spark.sql.catalyst.optimizer.PruneFilters
1298537 / 7511669 1 / 5
org.apache.spark.sql.execution.python.ExtractPythonUDFs
0 / 7419094 0 / 1
org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown
0 / 7129007 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance
0 / 6953704 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes
0 / 6409068 0 / 13
org.apache.spark.sql.catalyst.optimizer.RemoveNoopOperators
2948625 / 6073057 2 / 6
org.apache.spark.sql.catalyst.analysis.CleanupAliases
3304603 / 5512866 1 / 3
org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions
5247113 / 5247113 1 / 1
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog
0 / 4331593 0 / 13
org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers
0 / 3939580 0 / 1
org.apache.spark.sql.catalyst.optimizer.ConstantPropagation
0 / 3872746 0 / 4
org.apache.spark.sql.execution.dynamicpruning.CleanupDynamicPruningFilters
3285689 / 3285689 1 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables
0 / 2954363 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics
0 / 2772122 0 / 13
org.apache.spark.sql.catalyst.optimizer.DecimalAggregates
0 / 2751645 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF
0 / 2718652 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy
0 / 2682978 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy
0 / 2598472 0 / 13
org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate
0 / 2542252 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF
0 / 2451105 0 / 2
org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects
0 / 2438919 0 / 1
org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin
0 / 2428491 0 / 2
org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions
0 / 2277468 0 / 1049
org.apache.spark.sql.catalyst.optimizer.EliminateAggregateFilter
0 / 2273198 0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateSorts
0 / 2256059 0 / 1
org.apache.spark.sql.catalyst.optimizer.ReplaceUpdateFieldsExpression
0 / 2227798 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveUnion
0 / 2220071 0 / 13
org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates
0 / 2216158 0 / 1
org.apache.spark.sql.catalyst.optimizer.ReassignLambdaVariableID
0 / 2190345 0 / 1
org.apache.spark.sql.catalyst.optimizer.RewriteNonCorrelatedExists
0 / 2106499 0 / 1
org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions
0 / 2088289 0 / 1
org.apache.spark.sql.catalyst.optimizer.CombineConcats
0 / 2073285 0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint
0 / 1922211 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases
0 / 1900401 0 / 13
org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime
0 / 1895150 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate
0 / 1863956 0 / 13
org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabaseAndCatalog
0 / 1816019 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions
0 / 1787068 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin
0 / 1769046 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveCatalogs
0 / 1605576 0 / 13
org.apache.spark.sql.catalyst.optimizer.EliminateDistinct
0 / 1591594 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables
0 / 1547587 0 / 13
org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery
0 / 1526146 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot
0 / 1506795 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec
0 / 1502020 0 / 13
org.apache.spark.sql.catalyst.optimizer.PushExtraPredicateThroughJoin
0 / 1442567 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions
0 / 1420037 0 / 13
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile
0 / 1399535 0 / 13
org.apache.spark.sql.execution.python.ExtractGroupingPythonUDFFromAggregate
0 / 1385780 0 / 1
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries
0 / 1359249 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNamespace
0 / 1356344 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation
0 / 1355910 0 / 13
org.apache.spark.sql.execution.datasources.FallBackFileSourceV2
0 / 1325995 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUserSpecifiedColumns
0 / 1325072 0 / 13
org.apache.spark.sql.catalyst.optimizer.CollapseRepartition
0 / 1322557 0 / 4
org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin
0 / 1300710 0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateSerialization
0 / 1227217 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic
0 / 1226182 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveInsertInto
0 / 1225119 0 / 13
org.apache.spark.sql.catalyst.optimizer.PushLeftSemiLeftAntiThroughJoin
0 / 1172415 0 / 4
org.apache.spark.sql.catalyst.optimizer.ReorderJoin
0 / 1158346 0 / 4
org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation
0 / 1156508 0 / 2
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion
0 / 1082244 0 / 4
org.apache.spark.sql.catalyst.optimizer.CombineUnions
0 / 1050254 0 / 5
org.apache.spark.sql.catalyst.optimizer.CollapseWindow
0 / 1003136 0 / 4
org.apache.spark.sql.catalyst.optimizer.TransposeWindow
0 / 1001447 0 / 4
org.apache.spark.sql.catalyst.optimizer.ExtractPythonUDFFromJoinCondition
0 / 1001440 0 / 1
org.apache.spark.sql.catalyst.optimizer.LimitPushDown
0 / 974802 0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateLimits
0 / 927602 0 / 4
org.apache.spark.sql.catalyst.optimizer.CombineFilters
0 / 924219 0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin
0 / 768081 0 / 4
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals
0 / 733641 0 / 2
org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions
0 / 449017 0 / 1
org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
434360 / 434360 1 / 1
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughNonJoin
0 / 379178 0 / 1
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromGenerate
0 / 295206 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution
0 / 292907 0 / 2
org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters
0 / 256204 0 / 1
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning
0 / 249912 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints
0 / 249513 0 / 2
org.apache.spark.sql.catalyst.analysis.EliminateUnions
0 / 240773 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveJoinStrategyHints
0 / 230937 0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate
0 / 222074 0 / 1
org.apache.spark.sql.catalyst.analysis.EliminateView
0 / 216679 0 / 1
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences
0 / 137675 0 / 2
org.apache.spark.sql.execution.datasources.PreprocessTableCreation
0 / 124279 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveNoopDropTable
0 / 111526 0 / 2
org.apache.spark.sql.execution.datasources.DataSourceAnalysis
0 / 100454 0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter
0 / 99833 0 / 1
org.apache.spark.sql.catalyst.optimizer.OptimizeLimitZero
0 / 98320 0 / 1
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion
0 / 96634 0 / 2
org.apache.spark.sql.catalyst.optimizer.RewriteExceptAll
0 / 96403 0 / 1
org.apache.spark.sql.catalyst.optimizer.RewriteIntersectAll
0 / 94322 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAlterTableChanges
0 / 94092 0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin
0 / 92092 0 / 1
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin
0 / 91821 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints
0 / 91637 0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate
0 / 90672 0 / 1
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts
0 / 42624 0 / 2
org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery
0 / 25905 0 / 1
org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder
0 / 9020 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveHints$DisableHints
0 / 8111 0 / 2
```
</details>
The original description of SPARK-36444 did not show this improvement, but
it do significantly improve the SQL compile performance for such cases.
### Does this PR introduce _any_ user-facing change?
Significant SQL compile performance improvement for some cases.
### How was this patch tested?
Added UT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]