GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/20342
[SPARK-23170] Dump the statistics of effective runs of analyzer and
optimizer rules
## What changes were proposed in this pull request?
Dump the statistics of effective runs of analyzer and optimizer rules.
## How was this patch tested?
Do a manual run of TPCDSQuerySuite
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs = 175827
Total time: 20.699042877 seconds
Rule
Total Time Effective Time Total Runs
Effective Runs
org.apache.spark.sql.catalyst.optimizer.ColumnPruning
2340563794 1338268224 1875
761
org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution
1632672623 1625071881 788
37
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions
1395087131 347339931 1982
38
org.apache.spark.sql.catalyst.optimizer.PruneFilters
1177711364 21344174 1590
3
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries
1145135465 1131417128 285
39
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences
1008347217 663112062 1982
616
org.apache.spark.sql.catalyst.optimizer.ReorderJoin
767024424 693001699 1590
132
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability
598524650 40802876 742
12
org.apache.spark.sql.catalyst.analysis.DecimalPrecision
595384169 436153128 1982
211
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery
548178270 459695885 1982
49
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts
423002864 139869503 1982
86
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
405544962 17250184 1590
7
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin
383837603 284174662 1590
708
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases
372901885 3362332 1590
9
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints
364628214 343815519 285
192
org.apache.spark.sql.execution.datasources.FindDataSourceTable
303293296 285344766 1982
233
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions
233195019 92648171 1982
294
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion
220568919 73932736 1982
38
org.apache.spark.sql.catalyst.optimizer.NullPropagation
207976072 9072305 1590
26
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings
207027618 37834145 1982
40
org.apache.spark.sql.catalyst.optimizer.PushDownPredicate
203382836 176482044 1590
783
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion
192152216 15738573 1982
1
org.apache.spark.sql.catalyst.optimizer.ConstantFolding
191624610 58857553 1590
126
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion
183008262 78280172 1982
29
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator
176935299 0 1982
0
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone
170161002 74354990 1982
417
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator
166173174 0 1590
0
org.apache.spark.sql.catalyst.optimizer.OptimizeIn
155410763 8197045 1590
16
org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions
153726565 0 1590
0
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion
153013269 0 1982
0
org.apache.spark.sql.catalyst.optimizer.SimplifyCasts
146693495 13537077 1590
69
org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison
144818581 0 1590
0
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations
143943308 6889302 1982
27
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division
142925142 12653147 1982
8
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality
142775965 0 1982
0
org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals
141509150 0 1590
0
org.apache.spark.sql.catalyst.optimizer.LikeSimplification
132387762 636851 1590
1
org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions
127412361 0 1590
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame
126772671 9317887 1982
21
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion
116484407 0 1982
0
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion
115402736 0 1982
0
org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct
115071447 0 1982
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder
113115366 4563584 1982
14
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion
107747140 0 1982
0
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin
105020607 13907906 1590
11
org.apache.spark.sql.catalyst.analysis.TimeWindowing
101018029 0 1982
0
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery
98043747 7044358 1590
7
org.apache.spark.sql.catalyst.optimizer.ConstantPropagation
95173536 0 1590
0
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion
94134701 0 1982
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics
84419135 33892351 1982
11
org.apache.spark.sql.execution.datasources.DataSourceAnalysis
83297816 77023484 742
24
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer
77880196 36980636 1982
148
org.apache.spark.sql.execution.datasources.PreprocessTableCreation
74091407 0 742
0
org.apache.spark.sql.catalyst.analysis.CleanupAliases
73837147 37105855 1086
344
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantProject
73534618 31752937 1875
344
org.apache.spark.sql.execution.datasources.v2.PushDownOperatorsToDataSource
70120541 0 285
0
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation
67941776 0 1590
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions
62917712 22092402 1982
23
org.apache.spark.sql.catalyst.optimizer.CombineFilters
61116313 41021442 1590
449
org.apache.spark.sql.catalyst.optimizer.CollapseProject
60872313 30994661 1875
279
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases
58453489 12511798 1982
47
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions
58154315 0 750
0
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions
54678669 0 285
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences
53518211 7209138 1982
8
org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates
45840637 29436271 285
23
org.apache.spark.sql.catalyst.optimizer.CollapseRepartition
43321502 0 1590
0
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic
42117785 0 742
0
org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime
40843184 0 285
0
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion
39997563 5899863 1590
10
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
39412748 22359409 1990
233
org.apache.spark.sql.catalyst.optimizer.CombineUnions
38823264 1534424 1875
17
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes
38712372 7912192 1982
9
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF
38281659 0 742
0
org.apache.spark.sql.catalyst.optimizer.DecimalAggregates
38277381 17245272 385
100
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile
37342019 0 1982
0
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates
36958378 1207331 1982
46
org.apache.spark.sql.catalyst.optimizer.CombineLimits
36794793 0 1590
0
org.apache.spark.sql.catalyst.optimizer.LimitPushDown
36378469 0 1590
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance
34611065 0 1982
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast
33734785 0 1982
0
org.apache.spark.sql.catalyst.optimizer.EliminateSorts
33731370 0 1590
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy
33251765 1395920 1982
4
org.apache.spark.sql.catalyst.optimizer.EliminateSerialization
30890996 0 1590
0
org.apache.spark.sql.catalyst.optimizer.CollapseWindow
29512740 0 1590
0
org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin
29396498 1492235 300
7
org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery
29301037 21706110 285
148
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy
23819074 0 1982
0
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals
23136089 10062248 788
4
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion
20886216 0 742
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot
20639329 0 1982
0
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions
20293829 0 1990
0
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables
20255898 0 1982
0
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveBroadcastHints
20250460 0 750
0
org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions
19990727 39271 8280 26
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate
19578333 0 1982
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases
19414993 0 1982
0
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts
19291402 0 285
0
org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions
18790135 0 285
0
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin
18535762 0 1982
0
org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects
17835919 0 285
0
org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation
15200130 1525030 288
3
org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabase
14490778 0 285
0
org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
14021504 12790020 285
215
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates
13439887 0 285
0
org.apache.spark.sql.catalyst.analysis.EliminateBarriers
12336513 0 1086
0
org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery
12082986 0 285
0
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences
10792280 0 742
0
org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate
8978897 0 285
0
org.apache.spark.sql.catalyst.analysis.EliminateUnions
8886439 0 788
0
org.apache.spark.sql.catalyst.analysis.AliasViewChild
8317231 0 742
0
org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions
7964788 184237 286
1
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution
7396593 0 788
0
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints
6986385 0 750
0
org.apache.spark.sql.catalyst.analysis.EliminateView
6518436 0 285
0
org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
6452598 0 288
0
org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions
5510866 0 286
0
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter
5393429 0 300
0
org.apache.spark.sql.catalyst.optimizer.SimplifyCreateArrayOps
5296187 0 1590
0
org.apache.spark.sql.catalyst.optimizer.SimplifyCreateStructOps
5261249 0 1590
0
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin
5152594 925260 300
1
org.apache.spark.sql.catalyst.optimizer.CombineConcats
4916416 0 1590
0
org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters
4810314 0 285
0
org.apache.spark.sql.catalyst.optimizer.SimplifyCreateMapOps
4674195 0 1590
0
org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate
4406136 727433 300
15
org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate
4252456 0 285
0
org.apache.spark.sql.catalyst.optimizer.EliminateDistinct
1920392 0 285
0
org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder
1855658 0 285
0
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark reportExecution
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20342.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20342
----
commit e790ab9950aa3ed9a0662e4d10f9d8611ff8f1ee
Author: gatorsmile <gatorsmile@...>
Date: 2018-01-21T14:04:28Z
fix.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
