JoshRosen commented on code in PR #36441:
URL: https://github.com/apache/spark/pull/36441#discussion_r921495229


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala:
##########
@@ -839,6 +841,52 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with 
ExpressionEvalHelper
       Seq(1d, 2d, Double.NaN, null))
   }
 
+  test("SPARK-39081: compatibility of combo of HigherOrderFunctions" +
+    " with other Expression subclasses") {
+    // Dummy example given in JIRA, this is to test a compile time issue only, 
has no real usage
+    case class MyExploder(
+                           arrays: Expression,    // Array[AnyDataType]
+                           asOfDate: Expression,  // 
LambdaFunction[AnyDataType -> TimestampType]
+                           extractor: Expression) // TimestampType
+      extends HigherOrderFunction with Generator with TimeZoneAwareExpression {
+
+      override def children: Seq[Expression] = Seq[Expression](arrays, 
asOfDate)
+
+      override def withNewChildrenInternal(newChildren: 
IndexedSeq[Expression]): Expression = null
+
+      override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
null
+
+      override def nodePatternsInternal(): Seq[TreePattern] = Seq()
+
+      override def elementSchema: StructType = null
+
+      override def arguments: Seq[Expression] =
+        Seq(arrays, asOfDate)
+
+      override def argumentTypes: Seq[AbstractDataType] =
+        Seq(ArrayType, TimestampType)
+
+      override def doGenCode(ctx: CodegenContext, exp: ExprCode): ExprCode = 
null
+
+      override def functions: Seq[Expression] =
+        Seq(extractor)
+
+      override def functionTypes: Seq[TimestampType] =
+        Seq(TimestampType)
+
+      override def bind(f: (Expression,
+        Seq[(DataType, Boolean)]) => LambdaFunction): HigherOrderFunction = 
null
+
+      override def timeZoneId: Option[String] = None
+
+      override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = 
null
+    }
+
+    /* Should not get the error - value nodePatterns
+     ... cannot override final member, or conflict in nodePatterns between two 
types

Review Comment:
   What is the final effective value of `nodePatterns` for this new expression?
   
   Ideally it would be the _union_ of its superclasses, but in practice I think 
it will pick up the value from the last trait mixed in and would only have the 
`TIME_ZONE_AWARE_EXPRESSION` pattern but not `GENERATOR` or 
`HIGH_ORDER_FUNCTION`.
   
   I think this will lead to optimizer correctness bugs because it violates an 
existing invariant: today, all subclasses of the expressions / traits with 
`final` node patterns are _guaranteed_ to have those patterns set, but that 
invariant no longer holds after your change. The optimizer uses these node 
patterns to prune tree traversals and skip optimizer rule invocation in cases 
where it knows that rules cannot apply because subtrees do not contain the node 
types used by the rule; see https://issues.apache.org/jira/browse/SPARK-35042 
and 
https://docs.google.com/document/d/1SEUhkbo8X-0cYAJFYFDQhxUnKJBz4lLn3u4xR2qfWqk/edit#heading=h.4cp1yxp4c1fh
 for more background on this.
   
   For example, if you had a `HigherOrderFunction` subclass where the 
`HIGH_ORDER_FUNCTION` node pattern wasn't set then the `ResolveLambdaVariables` 
optimizer rule would end up [skipping that expression via pattern 
pruning](https://github.com/apache/spark/blob/35d00df9bba7238ad4f409999617fae4d04ddbfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/higherOrderFunctions.scala#L52-L58).
   
   (This is primarily a drive-by comment: I haven't done much recent work in 
the optimizer, so I'm not in the best position to suggest an alternative design 
or to lead the review of this PR. I just wanted to voice the top-of-my-head 
concern and explain (what I think is) the rationale for these variables having 
been final in the first place).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to