JoshRosen commented on code in PR #36441:
URL: https://github.com/apache/spark/pull/36441#discussion_r921495229
##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala:
##########
@@ -839,6 +841,52 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with
ExpressionEvalHelper
Seq(1d, 2d, Double.NaN, null))
}
+ test("SPARK-39081: compatibility of combo of HigherOrderFunctions" +
+ " with other Expression subclasses") {
+ // Dummy example given in JIRA, this is to test a compile time issue only,
has no real usage
+ case class MyExploder(
+ arrays: Expression, // Array[AnyDataType]
+ asOfDate: Expression, //
LambdaFunction[AnyDataType -> TimestampType]
+ extractor: Expression) // TimestampType
+ extends HigherOrderFunction with Generator with TimeZoneAwareExpression {
+
+ override def children: Seq[Expression] = Seq[Expression](arrays,
asOfDate)
+
+ override def withNewChildrenInternal(newChildren:
IndexedSeq[Expression]): Expression = null
+
+ override def eval(input: InternalRow): TraversableOnce[InternalRow] =
null
+
+ override def nodePatternsInternal(): Seq[TreePattern] = Seq()
+
+ override def elementSchema: StructType = null
+
+ override def arguments: Seq[Expression] =
+ Seq(arrays, asOfDate)
+
+ override def argumentTypes: Seq[AbstractDataType] =
+ Seq(ArrayType, TimestampType)
+
+ override def doGenCode(ctx: CodegenContext, exp: ExprCode): ExprCode =
null
+
+ override def functions: Seq[Expression] =
+ Seq(extractor)
+
+ override def functionTypes: Seq[TimestampType] =
+ Seq(TimestampType)
+
+ override def bind(f: (Expression,
+ Seq[(DataType, Boolean)]) => LambdaFunction): HigherOrderFunction =
null
+
+ override def timeZoneId: Option[String] = None
+
+ override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
null
+ }
+
+ /* Should not get the error - value nodePatterns
+ ... cannot override final member, or conflict in nodePatterns between two
types
Review Comment:
What is the final effective value of `nodePatterns` for this new expression?
Ideally it would be the _union_ of its superclasses' `nodePatterns`, but in
practice I think it will pick up the value from the last trait mixed in and
would only have the `TIME_ZONE_AWARE_EXPRESSION` pattern but not `GENERATOR` or
`HIGH_ORDER_FUNCTION`.
I think this will lead to optimizer correctness bugs because it violates an
existing invariant: today, all subclasses of the expressions / traits with
`final` node patterns are _guaranteed_ to have those patterns set, but that
invariant no longer holds after your change. The optimizer uses these node
patterns to prune tree traversals and skip optimizer rule invocation in cases
where it knows that rules cannot apply because subtrees do not contain the node
types used by the rule; see https://issues.apache.org/jira/browse/SPARK-35042
and
https://docs.google.com/document/d/1SEUhkbo8X-0cYAJFYFDQhxUnKJBz4lLn3u4xR2qfWqk/edit#heading=h.4cp1yxp4c1fh
for more background on this.
For example, if you had a `HigherOrderFunction` subclass where the
`HIGH_ORDER_FUNCTION` node pattern wasn't set then the `ResolveLambdaVariables`
optimizer rule would end up [skipping that expression via pattern
pruning](https://github.com/apache/spark/blob/35d00df9bba7238ad4f409999617fae4d04ddbfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/higherOrderFunctions.scala#L52-L58).
(This is primarily a drive-by comment: I haven't done much recent work in
the optimizer, so I'm not in the best position to suggest an alternative design
or to lead the review of this PR. I just wanted to voice the top-of-my-head
concern and explain (what I think is) the rationale for these variables having
been final in the first place).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]