Re: [PR] [SPARK-48318][SQL] Enable hash join support for all collations (complex types) [spark]

via GitHub Thu, 30 May 2024 05:18:08 -0700


uros-db commented on code in PR #46722:
URL: https://github.com/apache/spark/pull/46722#discussion_r1620601656



##########
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala:
##########
@@ -1046,6 +1046,211 @@ class CollationSuite extends DatasourceV2SQLBase with 
AdaptiveSparkPlanHelper {
     })
   }
 
+  test("hash join should be used for arrays of collated strings") {
+    val t1 = "T_1"
+    val t2 = "T_2"
+
+    case class HashJoinTestCase[R](collation: String, result: R)
+    val testCases = Seq(
+      HashJoinTestCase("UTF8_BINARY",
+        Seq(Row(Seq("aa"), 1, Seq("aa"), 2))),
+      HashJoinTestCase("UTF8_BINARY_LCASE",
+        Seq(Row(Seq("aa"), 1, Seq("AA"), 2), Row(Seq("aa"), 1, Seq("aa"), 2))),
+      HashJoinTestCase("UNICODE",
+        Seq(Row(Seq("aa"), 1, Seq("aa"), 2))),
+      HashJoinTestCase("UNICODE_CI",

Review Comment:
   I agree that we should start adding these tests. However, since this PR is 
kind of an extension on the previous one 
(https://github.com/apache/spark/pull/46599), I would say it may be better to 
proceed now, and then add tests for both later? we could possibly dedicate one 
PR to improve testing for CollationSuite with respect to new collations - let 
me know if that sounds good



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48318][SQL] Enable hash join support for all collations (complex types) [spark]

Reply via email to