Github user ptkool commented on a diff in the pull request:
https://github.com/apache/spark/pull/17708#discussion_r112549462
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
---
@@ -387,6 +387,13 @@ case class BroadcastHint(child: LogicalPlan) extends
UnaryNode {
}
/**
+ * A hint for the optimizer that we should not merge two projections.
+ */
+case class NoCollapseHint(child: LogicalPlan) extends UnaryNode {
--- End diff --
I originally thought about putting it at the expression level, but
ultimately decided it made more sense at the LogicalPlan node level, since the
purpose was in fact to `disrupt` the optimizer. In some respects, it's meant to
have the same effect as `df.cache()`, but without the caching. There may, in
fact, be situations where predicate pushdown is not desired because the
resulting condition would become complex and expensive to evaluate.
In Spark SQL, I think it also makes more sense to specify the hint at the
derived table level, as opposed to a single expression. For instance,
SELECT SNO, PNO, C1 +1, C1 + 2
FROM ( SELECT /*+ NO_COLLAPSE */ SNO, PNO, QTY * 10 AS C1 FROM T ) T
This is similar to the NO_MERGE query hint in Oracle, which prevents the
query from being flattened.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]