[GitHub] [spark] cloud-fan commented on a change in pull request #35352: [SPARK-38063][SQL] Support split_part Function

GitBox Fri, 18 Mar 2022 07:34:30 -0700


cloud-fan commented on a change in pull request #35352:
URL: https://github.com/apache/spark/pull/35352#discussion_r829720483




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -2095,10 +2095,12 @@ case class ArrayPosition(left: Expression, right: 
Expression)
 case class ElementAt(
     left: Expression,
     right: Expression,
+    // The value to return if index is out of bound
+    defaultValueOutOfBound: Expression = null,

Review comment:
       the default value for this parameter should be `Literal(null, NullType)`

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -2179,7 +2181,7 @@ case class ElementAt(
           if (failOnError) {
             throw QueryExecutionErrors.invalidElementAtIndexError(index, 
array.numElements())
           } else {
-            null
+            defaultValueOutOfBound

Review comment:
       ```suggestion
               defaultValueOutOfBound.eval()
   ```

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -2095,10 +2095,12 @@ case class ArrayPosition(left: Expression, right: 
Expression)
 case class ElementAt(
     left: Expression,
     right: Expression,
+    // The value to return if index is out of bound
+    defaultValueOutOfBound: Expression = null,

Review comment:
       The type can be `Literal` to be more type-safe

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -2217,8 +2219,10 @@ case class ElementAt(
 
           val indexOutOfBoundBranch = if (failOnError) {
             s"throw QueryExecutionErrors.invalidElementAtIndexError($index, 
$eval1.numElements());"
-          } else {
+          } else if (defaultValueOutOfBound == null) {

Review comment:
       This is not how to evaluate expressions...
   ```
   val defaultValueEval = defaultValueOutOfBound.genCode(ctx)
   s"""
     ...
     ${defaultValueEval.code}
     ${ev.isNull} = ${defaultValueEval.isNull}
     ${ev.value} = ${defaultValueEval.value}
   """
   ```

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -2095,10 +2095,12 @@ case class ArrayPosition(left: Expression, right: 
Expression)
 case class ElementAt(
     left: Expression,
     right: Expression,
+    // The value to return if index is out of bound
+    defaultValueOutOfBound: Expression = null,

Review comment:
       Actually, it's better to use `Option[Literal]`, where None means no 
default value and we should keep the old code path.

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
##########
@@ -661,4 +661,53 @@ class StringFunctionsSuite extends QueryTest with 
SharedSparkSession {
     }.getMessage
     assert(m.contains("data type mismatch: argument 1 requires string type"))
   }
+
+  test("SPARK-38063: string split_part function") {
+    checkAnswer(
+      sql("select split_part('11,12,13', ',', 1)"),
+      Row("11"))
+
+    checkAnswer(
+      sql("select split_part('11.12.13', '.', 2)"),
+      Row("12"))
+
+    checkAnswer(
+      sql("select split_part('11.12.13', '.', -1)"),
+      Row("13"))
+
+    checkAnswer(
+      sql("select split_part('11.12.13', '.', -3)"),
+      Row("11"))
+
+    checkAnswer(
+      sql("select split_part('11.12.13', '.', 4)"),
+      Row(""))
+
+    checkAnswer(
+      sql("select split_part('11.12.13', '.', 5)"),
+      Row(""))
+
+    checkAnswer(
+      sql("select split_part('11.12.13', '.', -5)"),
+      Row(""))
+
+    checkAnswer(
+      sql("select split_part('11.12.13', '', 1)"),
+      Row("11.12.13"))
+
+    checkAnswer(
+      sql("select split_part('11ab12ab13', 'ab', 1)"),
+      Row("11"))
+
+    val m = intercept[ArrayIndexOutOfBoundsException] {
+      checkAnswer(
+        sql("select split_part('11.12.13', '.', 0)"),
+        Row("11"))
+    }.getMessage
+    assert(m.contains("SQL array indices start at 1"))

Review comment:
       yea we can improve the existing error message for `element_at`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #35352: [SPARK-38063][SQL] Support split_part Function

Reply via email to