Github user bogdanrdc commented on a diff in the pull request:
https://github.com/apache/spark/pull/22205#discussion_r212918379
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -130,6 +130,10 @@ abstract class Optimizer(sessionCatalog:
SessionCatalog)
// since the other rules might make two separate Unions operators
adjacent.
Batch("Union", Once,
CombineUnions) ::
+ // run this once earlier. this might simplify the plan and reduce cost
of optimizer
--- End diff --
it makes the optimizer faster for short queries. see code above. a query
such as `Filter(LocalRelation)`, without this change, would go through all the
heavy optimizer rules. with this change, the query becomes just `LocalRelation`
which doesn't trigger many rules.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]