cloud-fan commented on code in PR #48203:
URL: https://github.com/apache/spark/pull/48203#discussion_r1770898280


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala:
##########
@@ -2076,4 +2077,14 @@ class StringExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
       )
     )
   }
+
+  test("SPARK-48712: Check whether input is valid utf-8 string or not before 
entering fast path") {
+    val str = UTF8String.fromBytes(Array[Byte](-1, -2, -3, -4))
+    assert(!str.isValid, "please use a string that is not valid UTF-8 for 
testing")
+    val expected = Array[Byte](-17, -65, -67, -17, -65, -67, -17, -65, -67, 
-17, -65, -67)
+    val bytes = Encode.encode(str, UTF8String.fromString("UTF-8"), false, 
false)
+    assert(bytes === expected)
+    checkEvaluation(Encode(Literal(str), Literal("UTF-8")), expected)
+    checkEvaluation(Encode(Literal(UTF8String.EMPTY_UTF8), Literal("UTF-8")), 
Array.emptyByteArray)

Review Comment:
   `EMPTY_UTF8` is different from an empty byte array, can we test 
`UTF8String.fromBytes(Array[Byte]())`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to