uros-db commented on PR #46206:
URL: https://github.com/apache/spark/pull/46206#issuecomment-2100191098
@cloud-fan some tests are stale here, but overall I think the changes in
this PR are good
however, changes are indeed large and merge conflicts arise daily - could
you please review
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1584647718
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1584638773
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580662930
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -306,6 +308,258 @@ public static int execICU(final UTF8String string,
mihailom-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580524717
##
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##
@@ -528,6 +528,235 @@ public void testFindInSet() throws SparkException
mihailom-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580521078
##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -608,6 +610,181 @@ class CollationStringExpressionsSuite
mihailom-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580518571
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -306,6 +308,258 @@ public static int execICU(final UTF8String
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580486060
##
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##
@@ -528,6 +528,235 @@ public void testFindInSet() throws SparkException {
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580485739
##
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##
@@ -528,6 +528,235 @@ public void testFindInSet() throws SparkException {
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580483101
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -403,6 +657,289 @@ private static int indexOf(final UTF8String target,
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580480968
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -403,6 +657,289 @@ private static int indexOf(final UTF8String target,
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580478491
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -403,6 +657,289 @@ private static int indexOf(final UTF8String target,
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580477853
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -403,6 +657,289 @@ private static int indexOf(final UTF8String target,
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580477853
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -403,6 +657,289 @@ private static int indexOf(final UTF8String target,
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580475098
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -403,6 +657,289 @@ private static int indexOf(final UTF8String target,
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1580473800
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -306,6 +308,258 @@ public static int execICU(final UTF8String string,
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579855561
##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -536,6 +538,134 @@ class CollationStringExpressionsSuite
}
}
+
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579854712
##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -536,6 +538,134 @@ class CollationStringExpressionsSuite
}
}
+
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579853324
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
Review Comment:
Decided on the approach offline and implemented it.
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579852005
##
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##
@@ -528,6 +528,235 @@ public void testFindInSet() throws SparkException {
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579049723
##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -536,6 +538,134 @@ class CollationStringExpressionsSuite
}
}
+
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579045485
##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -536,6 +538,134 @@ class CollationStringExpressionsSuite
}
}
+
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579041862
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -1150,12 +1166,11 @@ case class StringTrim(srcStr: Expression,
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579040264
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
Review Comment:
I agree, these don't appear to be too dangerous, and I also think
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579039004
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -214,6 +214,207 @@ public static int execICU(final UTF8String string,
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579037345
##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -536,6 +538,134 @@ class CollationStringExpressionsSuite
}
}
+
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579033342
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -1258,12 +1273,11 @@ case class StringTrimLeft(srcStr:
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579032798
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -1150,12 +1166,11 @@ case class StringTrim(srcStr:
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579031297
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
Review Comment:
hmm you are correct, this expression is a bit
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578919376
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -1150,12 +1166,11 @@ case class StringTrim(srcStr: Expression,
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579027113
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
Review Comment:
Per my understanding, logic here is the same as in
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579023601
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -214,6 +214,207 @@ public static int execICU(final UTF8String string,
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579018663
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
Review Comment:
Well, other methods like `numBytes` and `getBytes` are already
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579012569
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -214,6 +214,207 @@ public static int execICU(final UTF8String string,
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579011842
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -214,6 +214,207 @@ public static int execICU(final UTF8String string,
davidm-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1579007810
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -18,11 +18,9 @@
package org.apache.spark.sql.catalyst.analysis
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578920586
##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -536,6 +538,134 @@ class CollationStringExpressionsSuite
}
}
+
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578920396
##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -536,6 +538,134 @@ class CollationStringExpressionsSuite
}
}
+
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578919376
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -1150,12 +1166,11 @@ case class StringTrim(srcStr: Expression,
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578916556
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
Review Comment:
essentially, we only want 1 `exec` (and 1 `genCode`),
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578913462
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
Review Comment:
I know that we're trying to push all collation aware changes away
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578910474
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
Review Comment:
this doesn't look good
please follow the guides
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578909387
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -214,6 +214,207 @@ public static int execICU(final UTF8String string,
mihailom-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578909067
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -54,7 +52,8 @@ object CollationTypeCasts extends
uros-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578908647
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -214,6 +214,207 @@ public static int execICU(final UTF8String string,
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1578907085
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##
@@ -175,12 +175,19 @@ public Collation(
* Auxiliary methods for
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1578907442
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##
@@ -175,12 +175,19 @@ public Collation(
* Auxiliary methods for
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1578907085
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##
@@ -175,12 +175,19 @@ public Collation(
* Auxiliary methods for
mihailom-db commented on code in PR #46206:
URL: https://github.com/apache/spark/pull/46206#discussion_r1578906758
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -18,11 +18,9 @@
package
davidm-db closed pull request #45749: [SPARK-47409][SQL] Add support for
collation for StringTrim type of functions/expressions
URL: https://github.com/apache/spark/pull/45749
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
mihailom-db commented on PR #45749:
URL: https://github.com/apache/spark/pull/45749#issuecomment-2055969457
Please add these expressions to CollationTypeCasts rules.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
uros-db commented on PR #45749:
URL: https://github.com/apache/spark/pull/45749#issuecomment-2049814327
heads up: we’ve done some major code restructuring in
https://github.com/apache/spark/pull/45978, so please sync these changes before
moving on
@davidm-db you’ll likely need to
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546516509
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -628,6 +646,27 @@ public UTF8String trim(UTF8String trimString) {
}
}
+
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546539089
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -767,6 +947,139 @@ public UTF8String trimRight(UTF8String trimString) {
return
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546516509
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -628,6 +646,27 @@ public UTF8String trim(UTF8String trimString) {
}
}
+
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546509675
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -686,6 +743,111 @@ public UTF8String trimLeft(UTF8String trimString) {
return
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546505391
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -686,6 +743,111 @@ public UTF8String trimLeft(UTF8String trimString) {
return
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546505391
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -686,6 +743,111 @@ public UTF8String trimLeft(UTF8String trimString) {
return
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546489340
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -686,6 +743,111 @@ public UTF8String trimLeft(UTF8String trimString) {
return
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546489340
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -686,6 +743,111 @@ public UTF8String trimLeft(UTF8String trimString) {
return
stefankandic commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546432009
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -767,6 +947,139 @@ public UTF8String trimRight(UTF8String trimString) {
stefankandic commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546428567
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -686,6 +743,111 @@ public UTF8String trimLeft(UTF8String trimString) {
stefankandic commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546421174
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -686,6 +743,111 @@ public UTF8String trimLeft(UTF8String trimString) {
stefankandic commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546417209
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -767,6 +947,139 @@ public UTF8String trimRight(UTF8String trimString) {
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546329905
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -1278,12 +1340,25 @@ case class StringTrimLeft(srcStr:
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546328445
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -1170,12 +1219,25 @@ case class StringTrim(srcStr: Expression,
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546326805
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -1054,32 +1071,64 @@ trait String2TrimExpression extends
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546323015
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -1028,15 +1028,32 @@ trait String2TrimExpression extends
uros-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1546312983
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -686,6 +743,111 @@ public UTF8String trimLeft(UTF8String trimString) {
return
davidm-db commented on PR #45749:
URL: https://github.com/apache/spark/pull/45749#issuecomment-2027101103
@MaxGekk, @cloud-fan could someone please take a look at this PR?
Tagging the rest of Belgrade collation crew if anyone would like to review
additionally: @dbatomic,
davidm-db commented on PR #45749:
URL: https://github.com/apache/spark/pull/45749#issuecomment-2024875386
@mihailom-db I'm adding tests to both `CollationStringExpressionSuite` and
`CollationSuite` - I just pushed the changes so they can be reviewed while I'm
adding the tests :)
--
This
davidm-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1542675474
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -733,27 +873,30 @@ public UTF8String trimRight(UTF8String trimString) {
mihailom-db commented on PR #45749:
URL: https://github.com/apache/spark/pull/45749#issuecomment-2024866802
Also, please add tests to `CollationStringExpressionSuite`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
mihailom-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1542665840
##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -733,27 +873,30 @@ public UTF8String trimRight(UTF8String trimString) {
mihailom-db commented on code in PR #45749:
URL: https://github.com/apache/spark/pull/45749#discussion_r1542648461
##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java:
##
@@ -176,11 +176,11 @@ public Collation(
*/
public static
75 matches
Mail list logo