sunchao commented on code in PR #55885:
URL: https://github.com/apache/spark/pull/55885#discussion_r3470467090
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala:
##########
@@ -641,19 +640,15 @@ object KeyedPartitioning {
def supportsExpressions(expressions: Seq[Expression]): Boolean = {
def isSupportedTransform(transform: TransformExpression): Boolean = {
- transform.children.size == 1 && isReference(transform.children.head)
- }
-
- @tailrec
- def isReference(e: Expression): Boolean = e match {
- case _: Attribute => true
- case g: GetStructField => isReference(g.child)
- case _ => false
+ // Should only consider column references, not literals.
+ val nonLiteralChildren =
transform.children.filterNot(_.isInstanceOf[Literal])
+ // We need exactly one column reference per transform.
+ nonLiteralChildren.size == 1 &&
TransformExpression.isColumnRef(nonLiteralChildren.head)
Review Comment:
[P1] Apply bound-function input casts before evaluating parameterized
transforms through SPJ
This gate now admits parameterized transforms whose bound function requests
legal implicit casts. `BoundFunction.inputTypes()` explicitly allows types to
differ from those passed to `bind`, with Spark responsible for casting.
However, the scan path stores the raw Catalyst children in
`TransformExpression`, and the identity-vs-transform reducer later binds and
directly evaluates that expression without running Analyzer type coercion.
For a concrete case, a connector can report `truncate(str_col,
Expressions.literal(2.toShort))`; binding `(StringType, ShortType)` to a scalar
function that declares `(StringType, IntegerType)` is legal. This transform
passes the new gate, but the synthetic reducer evaluates the raw `Short`
literal through `ApplyFunctionExpression`. Its
`SpecificInternalRow(IntegerType)` then attempts to cast the boxed `Short` to
`Int`, raising `ClassCastException`. Before parameterized transforms were
admitted here, this query would fall back to a shuffle.
Please coerce the transform children to `function.inputTypes()` before
direct reducer evaluation (or reject mismatched types from this SPJ path), and
add an identity-vs-parameterized-transform regression covering an implicitly
cast literal.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]