cloud-fan commented on code in PR #56575:
URL: https://github.com/apache/spark/pull/56575#discussion_r3489052013


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##########
@@ -2359,29 +2359,24 @@ case class Substring(str: Expression, pos: Expression, 
len: Expression)
   since = "2.3.0",
   group = "string_funcs")
 // scalastyle:on line.size.limit
-case class Right(str: Expression, len: Expression) extends RuntimeReplaceable
-  with ImplicitCastInputTypes with BinaryLike[Expression] {
-
-  override lazy val replacement: Expression = If(
-    IsNull(str),
-    Literal(null, str.dataType),
-    If(
-      LessThanOrEqual(len, Literal(0)),
-      Literal(UTF8String.EMPTY_UTF8, str.dataType),
-      new Substring(str, UnaryMinus(len, failOnError = false))
-    )
-  )
+object Right extends DelegateFunction {
+  override val name: String = "right"
 
   override def inputTypes: Seq[AbstractDataType] =
-    Seq(
-      StringTypeWithCollation(supportsTrimCollation = true),
-      IntegerType
-    )
-  override def left: Expression = str
-  override def right: Expression = len
-  override protected def withNewChildrenInternal(
-      newLeft: Expression, newRight: Expression): Expression = {
-    copy(str = newLeft, len = newRight)
+    Seq(StringTypeWithCollation(supportsTrimCollation = true), IntegerType)
+
+  // NOTE: runs at parse time on unresolved args, so it must not read an 
input's `.dataType`.
+  // The `If` branch types are unified later by type coercion.
+  override def lower(args: Seq[Expression]): Expression = {
+    val str = args(0)
+    val len = args(1)
+    If(
+      IsNull(str),
+      Literal(null, StringType),

Review Comment:
   Thanks — the finding is right (the CHAR/VARCHAR result type was regressing), 
and it's fixed in bffecc826d6. But using `str.dataType` directly isn't safe 
here, for a reason worth recording:
   
   At `build`/`lower` time the argument is the not-yet-coerced 
`ImplicitCastInput` marker, and the marker delegates `dataType` to its 
(uncoerced) child. So `str.dataType` is the *input* type, which isn't 
necessarily a string yet — e.g. `right(12345, 2)` has an `IntegerType` child 
that the implicit cast turns into a string only later. Typing the empty-string 
branch as `Literal(UTF8String.EMPTY_UTF8, IntegerType)` then fails `Literal` 
validation (`Literal must have a corresponding value to int, but class 
UTF8String found`). Confirmed: a naive `str.dataType` crashes 
`DelegateExpressionQuerySuite."right() implicit-casts a non-string arg ..."`.
   
   So the fix types the null/empty branch literals with `str.dataType` only 
when it is already a string-family type (`StringType` / `CharType` / 
`VarcharType`) — preserving CHAR(N)/VARCHAR(N) under 
`preserveCharVarcharTypeInfo` and non-default collations — and falls back to 
plain `StringType` otherwise (the type the implicit cast produces). New test 
`right() preserves the input CHAR/VARCHAR type with 
preserveCharVarcharTypeInfo` asserts `typeof(right(CAST('abc' AS CHAR(5)), 2)) 
= char(5)`; the collation case was already covered.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to