Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-29 Thread via GitHub


nikolamand-db closed pull request #45963: [SPARK-47415][SQL] Collation support: 
Levenshtein
URL: https://github.com/apache/spark/pull/45963


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-29 Thread via GitHub


nikolamand-db commented on PR #45963:
URL: https://github.com/apache/spark/pull/45963#issuecomment-2137018880

   After a lengthy discussion we decided to stop Levenshtein collations effort. 
The well defined definition of Levenshtein as _the minimum number of 
single-character edits to make the strings compare equal_ would be interpreted 
under collations as _the minimum number of single-character edits to make the 
strings equal under given collation_ is very hard to achieve under 
non-exponential time.
   
   Reason for this is that Levenshtein distance algorithm assumes we can 
compare single character in both strings for equality whereas under collations 
we cannot safely assume it is enough to compare single character. For example, 
sequences `İ` and `i\u0307` compare equal under `UTF8_BINARY_LCASE` collation 
but have different number of characters (1 vs 2 in order).
   
   Thanks @nebojsa-db and reviewers for your effort. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-14 Thread via GitHub


nebojsa-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1599656074


##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -1509,12 +1515,62 @@ public boolean semanticEquals(final UTF8String other, 
int collationId) {
 return 
CollationFactory.fetchCollation(collationId).equalsFunction.apply(this, other);
   }
 
+  private interface SubstringEquals {
+boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight);
+  }
+
+  private static class ByteSubstringEquals implements SubstringEquals {
+@Override
+public boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight) {
+  if (lenLeft != lenRight || left.getByte(posLeft) != 
right.getByte(posRight)) {
+return false;
+  }
+  return (ByteArrayMethods.arrayEquals(left.base, left.offset + posLeft, 
right.base,
+right.offset + posRight, lenLeft));
+}
+  }
+
+  private static final ByteSubstringEquals byteSubstringEquals = new 
ByteSubstringEquals();
+
+  private static class CollationSubstringEquals implements SubstringEquals {
+private final int collationId;
+private final UTF8String left, right;
+
+CollationSubstringEquals(int collationId) {
+  this.collationId = collationId;
+  this.left = new UTF8String();
+  this.right = new UTF8String();
+}
+
+@Override
+public boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight) {
+  moveAddress(this.left, left.base, left.offset + posLeft, lenLeft);

Review Comment:
   Done, in addition with baseString which is also needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-13 Thread via GitHub


nebojsa-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1598135262


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -722,6 +722,65 @@ public static UTF8String execLowercase(
 }
   }
 
+  /**
+   * Utility class for collation aware Levenshtein function.
+   */
+  public static class Levenshtein{
+
+/**
+ * Implementation of SubstringEquals interface for collation aware 
comparison of two substrings.
+ */
+private static class CollationSubstringEquals implements 
UTF8String.SubstringEquals {
+  private final int collationId;
+  private final UTF8String left, right;
+
+  CollationSubstringEquals(int collationId) {
+this.collationId = collationId;
+this.left = new UTF8String();
+this.right = new UTF8String();
+  }
+
+  @Override
+  public boolean equals(UTF8String left, UTF8String right, int posLeft, 
int posRight,
+int lenLeft, int lenRight) {
+this.left.moveAddress(left, posLeft, lenLeft);
+this.right.moveAddress(right, posRight, lenRight);
+return CollationFactory.fetchCollation(collationId).equalsFunction
+.apply(this.left, this.right);
+  }
+}
+
+public static Integer exec(final UTF8String left, final UTF8String right, 
final int collationId){
+  CollationFactory.Collation collation = 
CollationFactory.fetchCollation(collationId);
+
+  if (collation.supportsBinaryEquality){
+return left.levenshteinDistance(right);
+  }
+  else{
+return left.levenshteinDistance(right, new 
CollationSubstringEquals(collationId));
+  }
+}
+
+public static Integer execWithThreshold(final UTF8String left, final 
UTF8String right, final int threshold, final int collationId){
+  CollationFactory.Collation collation = 
CollationFactory.fetchCollation(collationId);
+
+  if (collation.supportsBinaryEquality){
+return left.levenshteinDistance(right, threshold);
+  }
+  else{
+return left.levenshteinDistance(right, threshold, new 
CollationSubstringEquals(collationId));
+  }

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-13 Thread via GitHub


mihailom-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1598132264


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -71,6 +71,9 @@ object CollationTypeCasts extends TypeCoercionRule {
   val Seq(newStr, newPad) = collateToSingleType(Seq(str, pad))
   stringPadExpr.withNewChildren(Seq(newStr, len, newPad))
 
+case lev: Levenshtein =>

Review Comment:
   nit: please use levExpr to follow other patterns



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-13 Thread via GitHub


nebojsa-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1598094663


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -722,6 +722,65 @@ public static UTF8String execLowercase(
 }
   }
 
+  /**
+   * Utility class for collation aware Levenshtein function.
+   */
+  public static class Levenshtein{

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-10 Thread via GitHub


nikolamand-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1596897091


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -722,6 +722,65 @@ public static UTF8String execLowercase(
 }
   }
 
+  /**
+   * Utility class for collation aware Levenshtein function.
+   */
+  public static class Levenshtein{
+
+/**
+ * Implementation of SubstringEquals interface for collation aware 
comparison of two substrings.
+ */
+private static class CollationSubstringEquals implements 
UTF8String.SubstringEquals {
+  private final int collationId;
+  private final UTF8String left, right;
+
+  CollationSubstringEquals(int collationId) {
+this.collationId = collationId;
+this.left = new UTF8String();
+this.right = new UTF8String();
+  }
+
+  @Override
+  public boolean equals(UTF8String left, UTF8String right, int posLeft, 
int posRight,
+int lenLeft, int lenRight) {
+this.left.moveAddress(left, posLeft, lenLeft);
+this.right.moveAddress(right, posRight, lenRight);
+return CollationFactory.fetchCollation(collationId).equalsFunction
+.apply(this.left, this.right);
+  }
+}
+
+public static Integer exec(final UTF8String left, final UTF8String right, 
final int collationId){
+  CollationFactory.Collation collation = 
CollationFactory.fetchCollation(collationId);
+
+  if (collation.supportsBinaryEquality){
+return left.levenshteinDistance(right);
+  }
+  else{
+return left.levenshteinDistance(right, new 
CollationSubstringEquals(collationId));
+  }
+}
+
+public static Integer execWithThreshold(final UTF8String left, final 
UTF8String right, final int threshold, final int collationId){
+  CollationFactory.Collation collation = 
CollationFactory.fetchCollation(collationId);
+
+  if (collation.supportsBinaryEquality){
+return left.levenshteinDistance(right, threshold);
+  }
+  else{
+return left.levenshteinDistance(right, threshold, new 
CollationSubstringEquals(collationId));
+  }

Review Comment:
   This should be done in other places as well (both threshold & non-threshold 
Levenshtein).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-10 Thread via GitHub


nikolamand-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1596887611


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -722,6 +722,65 @@ public static UTF8String execLowercase(
 }
   }
 
+  /**
+   * Utility class for collation aware Levenshtein function.
+   */
+  public static class Levenshtein{

Review Comment:
   ```suggestion
 public static class Levenshtein {
   ```



##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -722,6 +722,65 @@ public static UTF8String execLowercase(
 }
   }
 
+  /**
+   * Utility class for collation aware Levenshtein function.
+   */
+  public static class Levenshtein{
+
+/**
+ * Implementation of SubstringEquals interface for collation aware 
comparison of two substrings.
+ */
+private static class CollationSubstringEquals implements 
UTF8String.SubstringEquals {
+  private final int collationId;
+  private final UTF8String left, right;
+
+  CollationSubstringEquals(int collationId) {
+this.collationId = collationId;
+this.left = new UTF8String();
+this.right = new UTF8String();
+  }
+
+  @Override
+  public boolean equals(UTF8String left, UTF8String right, int posLeft, 
int posRight,
+int lenLeft, int lenRight) {
+this.left.moveAddress(left, posLeft, lenLeft);
+this.right.moveAddress(right, posRight, lenRight);
+return CollationFactory.fetchCollation(collationId).equalsFunction
+.apply(this.left, this.right);
+  }
+}
+
+public static Integer exec(final UTF8String left, final UTF8String right, 
final int collationId){
+  CollationFactory.Collation collation = 
CollationFactory.fetchCollation(collationId);
+
+  if (collation.supportsBinaryEquality){
+return left.levenshteinDistance(right);
+  }
+  else{

Review Comment:
   ```suggestion
 else {
   ```



##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -722,6 +722,65 @@ public static UTF8String execLowercase(
 }
   }
 
+  /**
+   * Utility class for collation aware Levenshtein function.
+   */
+  public static class Levenshtein{
+
+/**
+ * Implementation of SubstringEquals interface for collation aware 
comparison of two substrings.
+ */
+private static class CollationSubstringEquals implements 
UTF8String.SubstringEquals {
+  private final int collationId;
+  private final UTF8String left, right;
+
+  CollationSubstringEquals(int collationId) {
+this.collationId = collationId;
+this.left = new UTF8String();
+this.right = new UTF8String();
+  }
+
+  @Override
+  public boolean equals(UTF8String left, UTF8String right, int posLeft, 
int posRight,
+int lenLeft, int lenRight) {
+this.left.moveAddress(left, posLeft, lenLeft);
+this.right.moveAddress(right, posRight, lenRight);
+return CollationFactory.fetchCollation(collationId).equalsFunction
+.apply(this.left, this.right);
+  }
+}
+
+public static Integer exec(final UTF8String left, final UTF8String right, 
final int collationId){
+  CollationFactory.Collation collation = 
CollationFactory.fetchCollation(collationId);
+
+  if (collation.supportsBinaryEquality){

Review Comment:
   ```suggestion
 if (collation.supportsBinaryEquality) {
   ```



##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java:
##
@@ -722,6 +722,65 @@ public static UTF8String execLowercase(
 }
   }
 
+  /**
+   * Utility class for collation aware Levenshtein function.
+   */
+  public static class Levenshtein{
+
+/**
+ * Implementation of SubstringEquals interface for collation aware 
comparison of two substrings.
+ */
+private static class CollationSubstringEquals implements 
UTF8String.SubstringEquals {
+  private final int collationId;
+  private final UTF8String left, right;
+
+  CollationSubstringEquals(int collationId) {
+this.collationId = collationId;
+this.left = new UTF8String();
+this.right = new UTF8String();
+  }
+
+  @Override
+  public boolean equals(UTF8String left, UTF8String right, int posLeft, 
int posRight,
+int lenLeft, int lenRight) {
+this.left.moveAddress(left, posLeft, lenLeft);
+this.right.moveAddress(right, posRight, lenRight);
+return CollationFactory.fetchCollation(collationId).equalsFunction
+.apply(this.left, this.right);
+  }
+}
+
+public static Integer exec(final UTF8String left, final UTF8String right, 
final int collationId){
+  CollationFactory.Collation collation = 

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-10 Thread via GitHub


nebojsa-db commented on PR #45963:
URL: https://github.com/apache/spark/pull/45963#issuecomment-2104752689

   Please review latest iteration @dbatomic @stefankandic @uros-db @mihailom-db 
@stevomitric 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-10 Thread via GitHub


nebojsa-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1596860093


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -2210,7 +2210,7 @@ case class Levenshtein(
 threshold: Option[Expression] = None)
   extends Expression
   with ImplicitCastInputTypes
-  with NullIntolerant{
+  with NullIntolerant {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-09 Thread via GitHub


mihailom-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1595392035


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -2210,7 +2210,7 @@ case class Levenshtein(
 threshold: Option[Expression] = None)
   extends Expression
   with ImplicitCastInputTypes
-  with NullIntolerant{
+  with NullIntolerant {

Review Comment:
   nit: Remove unnecessary changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


cloud-fan commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1560358232


##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -89,6 +89,73 @@ class CollationStringExpressionsSuite
 testRepeat("UNICODE_CI", 3, "abc", 2)
   }
 
+  test("Levenshtein expressions with collation") {
+def prepareLevenshtein(
+left: String,
+right: String,
+collation: String): Levenshtein = {
+  val collationId = CollationFactory.collationNameToId(collation)
+  val leftLit = Literal.create(left, StringType(collationId))
+  val rightLit = Literal.create(right, StringType(collationId))
+  Levenshtein(leftLit, rightLit)
+}
+
+Seq(
+  // scalastyle:off nonascii
+  CollationTestCase("", "", "UTF8_BINARY", 0),
+  CollationTestCase("", "something", "UTF8_BINARY", 9),
+  CollationTestCase("a", "a", "UTF8_BINARY", 0),
+  CollationTestCase("a", "A", "UTF8_BINARY", 1),
+  CollationTestCase("a", "a", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("a", "A", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("bd", "ABc", "UTF8_BINARY_LCASE", 2),
+  CollationTestCase("Xü", "Ü", "UTF8_BINARY_LCASE", 1),
+  CollationTestCase("Xũ", "Üx", "UTF8_BINARY_LCASE", 2),
+  CollationTestCase("", "something", "UTF8_BINARY_LCASE", 9),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UNICODE", 5),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UNICODE_CI", 0)
+  // scalastyle:on nonascii
+).foreach(c => checkEvaluation(prepareLevenshtein(c.s1, c.s2, 
c.collation), c.expectedResult))
+  }
+

Review Comment:
   Yea we need both unit tests and end-to-end tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


cloud-fan commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1560355740


##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -1636,14 +1699,13 @@ public int levenshteinDistance(UTF8String other, int 
threshold) {
 } else if (i == min - 1) {
   num_bytes_i = 0;
 } else {
-  if (ByteArrayMethods.arrayEquals(t.base, t.offset + j_bytes,
-  s.base, s.offset + i_bytes, num_bytes_j)) {
+  num_bytes_i = numBytesForFirstByte(s.getByte(i_bytes));

Review Comment:
   I don't quite get the details here. Why do we need to pre-evalute 
`numBytesForFirstByte` now? It was evaluated at the end of the loop body before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


cloud-fan commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1560352992


##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -1509,12 +1515,62 @@ public boolean semanticEquals(final UTF8String other, 
int collationId) {
 return 
CollationFactory.fetchCollation(collationId).equalsFunction.apply(this, other);
   }
 
+  private interface SubstringEquals {
+boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight);
+  }
+
+  private static class ByteSubstringEquals implements SubstringEquals {
+@Override
+public boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight) {
+  if (lenLeft != lenRight || left.getByte(posLeft) != 
right.getByte(posRight)) {
+return false;
+  }
+  return (ByteArrayMethods.arrayEquals(left.base, left.offset + posLeft, 
right.base,
+right.offset + posRight, lenLeft));
+}
+  }
+
+  private static final ByteSubstringEquals byteSubstringEquals = new 
ByteSubstringEquals();
+
+  private static class CollationSubstringEquals implements SubstringEquals {
+private final int collationId;
+private final UTF8String left, right;
+
+CollationSubstringEquals(int collationId) {
+  this.collationId = collationId;
+  this.left = new UTF8String();
+  this.right = new UTF8String();
+}
+
+@Override
+public boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight) {
+  moveAddress(this.left, left.base, left.offset + posLeft, lenLeft);

Review Comment:
   Should `moveAddress` be an instance method that takes two int parameters 
(posLeft and lenLeft)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


nikolamand-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1559431415


##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -89,6 +89,73 @@ class CollationStringExpressionsSuite
 testRepeat("UNICODE_CI", 3, "abc", 2)
   }
 
+  test("Levenshtein expressions with collation") {
+def prepareLevenshtein(
+left: String,
+right: String,
+collation: String): Levenshtein = {
+  val collationId = CollationFactory.collationNameToId(collation)
+  val leftLit = Literal.create(left, StringType(collationId))
+  val rightLit = Literal.create(right, StringType(collationId))
+  Levenshtein(leftLit, rightLit)
+}
+
+Seq(
+  // scalastyle:off nonascii

Review Comment:
   Wrapped the class with `scalastyle:off nonascii`.



##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -89,6 +89,73 @@ class CollationStringExpressionsSuite
 testRepeat("UNICODE_CI", 3, "abc", 2)
   }
 
+  test("Levenshtein expressions with collation") {
+def prepareLevenshtein(
+left: String,
+right: String,
+collation: String): Levenshtein = {
+  val collationId = CollationFactory.collationNameToId(collation)
+  val leftLit = Literal.create(left, StringType(collationId))
+  val rightLit = Literal.create(right, StringType(collationId))
+  Levenshtein(leftLit, rightLit)
+}
+
+Seq(
+  // scalastyle:off nonascii
+  CollationTestCase("", "", "UTF8_BINARY", 0),
+  CollationTestCase("", "something", "UTF8_BINARY", 9),
+  CollationTestCase("a", "a", "UTF8_BINARY", 0),
+  CollationTestCase("a", "A", "UTF8_BINARY", 1),
+  CollationTestCase("a", "a", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("a", "A", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("bd", "ABc", "UTF8_BINARY_LCASE", 2),
+  CollationTestCase("Xü", "Ü", "UTF8_BINARY_LCASE", 1),
+  CollationTestCase("Xũ", "Üx", "UTF8_BINARY_LCASE", 2),
+  CollationTestCase("", "something", "UTF8_BINARY_LCASE", 9),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UNICODE", 5),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UNICODE_CI", 0)
+  // scalastyle:on nonascii
+).foreach(c => checkEvaluation(prepareLevenshtein(c.s1, c.s2, 
c.collation), c.expectedResult))
+  }
+

Review Comment:
   Added the tests to CollationSuite, please check.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -2213,8 +2213,8 @@ case class Levenshtein(
   }
 
   override def inputTypes: Seq[AbstractDataType] = threshold match {
-case Some(_) => Seq(StringType, StringType, IntegerType)
-case _ => Seq(StringType, StringType)
+case Some(_) => Seq(StringTypeAnyCollation, StringTypeAnyCollation, 
IntegerType)
+case _ => Seq(StringTypeAnyCollation, StringTypeAnyCollation)

Review Comment:
   Added special case for Levenshtein to `CollationTypeCasts`, please check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


nikolamand-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1559340404


##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -1509,12 +1509,66 @@ public boolean semanticEquals(final UTF8String other, 
int collationId) {
 return 
CollationFactory.fetchCollation(collationId).equalsFunction.apply(this, other);
   }
 
+  private interface SubstringEquals {
+boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight);
+  }
+
+  private static class ByteSubstringEquals implements SubstringEquals {
+@Override
+public boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight) {
+  if (lenLeft != lenRight || left.getByte(posLeft) != 
right.getByte(posRight)) {
+return false;
+  }
+  return (ByteArrayMethods.arrayEquals(left.base, left.offset + posLeft, 
right.base,
+right.offset + posRight, lenLeft));
+}
+  }
+
+  private static final ByteSubstringEquals byteSubstringEquals = new 
ByteSubstringEquals();
+
+  private static class CollationSubstringEquals implements SubstringEquals {

Review Comment:
   It's true that `UTF8String` is a small object (base, offset and length), but 
the issue with Levenshtein distance is that the algorithm has quadratic 
complexity. So when comparing two strings of length 100, you would allocate 10k 
objects which is a waste.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


stefankandic commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1559333644


##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -1509,12 +1509,66 @@ public boolean semanticEquals(final UTF8String other, 
int collationId) {
 return 
CollationFactory.fetchCollation(collationId).equalsFunction.apply(this, other);
   }
 
+  private interface SubstringEquals {
+boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight);
+  }
+
+  private static class ByteSubstringEquals implements SubstringEquals {
+@Override
+public boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight) {
+  if (lenLeft != lenRight || left.getByte(posLeft) != 
right.getByte(posRight)) {
+return false;
+  }
+  return (ByteArrayMethods.arrayEquals(left.base, left.offset + posLeft, 
right.base,
+right.offset + posRight, lenLeft));
+}
+  }
+
+  private static final ByteSubstringEquals byteSubstringEquals = new 
ByteSubstringEquals();
+
+  private static class CollationSubstringEquals implements SubstringEquals {

Review Comment:
   have you actually tested the impact of having a smaller number of 
allocations? Because creating a UTF8String might not be that expensive, 
especially compared to converting from UTF8String to String which will happen 
anyway



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


stefankandic commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1559333644


##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -1509,12 +1509,66 @@ public boolean semanticEquals(final UTF8String other, 
int collationId) {
 return 
CollationFactory.fetchCollation(collationId).equalsFunction.apply(this, other);
   }
 
+  private interface SubstringEquals {
+boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight);
+  }
+
+  private static class ByteSubstringEquals implements SubstringEquals {
+@Override
+public boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight) {
+  if (lenLeft != lenRight || left.getByte(posLeft) != 
right.getByte(posRight)) {
+return false;
+  }
+  return (ByteArrayMethods.arrayEquals(left.base, left.offset + posLeft, 
right.base,
+right.offset + posRight, lenLeft));
+}
+  }
+
+  private static final ByteSubstringEquals byteSubstringEquals = new 
ByteSubstringEquals();
+
+  private static class CollationSubstringEquals implements SubstringEquals {

Review Comment:
   have you actually tested the impact of having a smaller number of 
allocations? Because creating a UTF8 string might not be that expensive, 
especially compared to converting from UTF8String to String which will happen 
anyway



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


stefankandic commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1559327591


##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -89,6 +89,73 @@ class CollationStringExpressionsSuite
 testRepeat("UNICODE_CI", 3, "abc", 2)
   }
 
+  test("Levenshtein expressions with collation") {
+def prepareLevenshtein(
+left: String,
+right: String,
+collation: String): Levenshtein = {
+  val collationId = CollationFactory.collationNameToId(collation)
+  val leftLit = Literal.create(left, StringType(collationId))
+  val rightLit = Literal.create(right, StringType(collationId))
+  Levenshtein(leftLit, rightLit)
+}
+
+Seq(
+  // scalastyle:off nonascii

Review Comment:
   there's a good chance we will use nonascii characters a lot in this file, 
what about disabling this for the entire class/file?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


stefankandic commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1559326485


##
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##
@@ -1509,12 +1509,66 @@ public boolean semanticEquals(final UTF8String other, 
int collationId) {
 return 
CollationFactory.fetchCollation(collationId).equalsFunction.apply(this, other);
   }
 
+  private interface SubstringEquals {
+boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight);
+  }
+
+  private static class ByteSubstringEquals implements SubstringEquals {
+@Override
+public boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight) {
+  if (lenLeft != lenRight || left.getByte(posLeft) != 
right.getByte(posRight)) {
+return false;
+  }
+  return (ByteArrayMethods.arrayEquals(left.base, left.offset + posLeft, 
right.base,
+right.offset + posRight, lenLeft));
+}
+  }
+
+  private static final ByteSubstringEquals byteSubstringEquals = new 
ByteSubstringEquals();
+
+  private static class CollationSubstringEquals implements SubstringEquals {
+private final int collationId;
+private final UTF8String left, right;
+
+CollationSubstringEquals(int collationId) {
+  this.collationId = collationId;
+  this.left = new UTF8String();
+  this.right = new UTF8String();
+}
+
+@Override
+public boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+  int lenLeft, int lenRight) {
+  this.left.base = left.base;

Review Comment:
   can we maybe extract this logic into a method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


uros-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1559311139


##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -89,6 +89,73 @@ class CollationStringExpressionsSuite
 testRepeat("UNICODE_CI", 3, "abc", 2)
   }
 
+  test("Levenshtein expressions with collation") {
+def prepareLevenshtein(
+left: String,
+right: String,
+collation: String): Levenshtein = {
+  val collationId = CollationFactory.collationNameToId(collation)
+  val leftLit = Literal.create(left, StringType(collationId))
+  val rightLit = Literal.create(right, StringType(collationId))
+  Levenshtein(leftLit, rightLit)
+}
+
+Seq(
+  // scalastyle:off nonascii
+  CollationTestCase("", "", "UTF8_BINARY", 0),
+  CollationTestCase("", "something", "UTF8_BINARY", 9),
+  CollationTestCase("a", "a", "UTF8_BINARY", 0),
+  CollationTestCase("a", "A", "UTF8_BINARY", 1),
+  CollationTestCase("a", "a", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("a", "A", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("bd", "ABc", "UTF8_BINARY_LCASE", 2),
+  CollationTestCase("Xü", "Ü", "UTF8_BINARY_LCASE", 1),
+  CollationTestCase("Xũ", "Üx", "UTF8_BINARY_LCASE", 2),
+  CollationTestCase("", "something", "UTF8_BINARY_LCASE", 9),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UNICODE", 5),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UNICODE_CI", 0)
+  // scalastyle:on nonascii
+).foreach(c => checkEvaluation(prepareLevenshtein(c.s1, c.s2, 
c.collation), c.expectedResult))
+  }
+

Review Comment:
   my thoughts are: yes
   
   @dbatomic and myself will be finishing some refactoring PRs soon, and will 
likely request all currently open expression collation awareness PRs to comply 
with some new design before further reivew



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


uros-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1559311139


##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -89,6 +89,73 @@ class CollationStringExpressionsSuite
 testRepeat("UNICODE_CI", 3, "abc", 2)
   }
 
+  test("Levenshtein expressions with collation") {
+def prepareLevenshtein(
+left: String,
+right: String,
+collation: String): Levenshtein = {
+  val collationId = CollationFactory.collationNameToId(collation)
+  val leftLit = Literal.create(left, StringType(collationId))
+  val rightLit = Literal.create(right, StringType(collationId))
+  Levenshtein(leftLit, rightLit)
+}
+
+Seq(
+  // scalastyle:off nonascii
+  CollationTestCase("", "", "UTF8_BINARY", 0),
+  CollationTestCase("", "something", "UTF8_BINARY", 9),
+  CollationTestCase("a", "a", "UTF8_BINARY", 0),
+  CollationTestCase("a", "A", "UTF8_BINARY", 1),
+  CollationTestCase("a", "a", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("a", "A", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("bd", "ABc", "UTF8_BINARY_LCASE", 2),
+  CollationTestCase("Xü", "Ü", "UTF8_BINARY_LCASE", 1),
+  CollationTestCase("Xũ", "Üx", "UTF8_BINARY_LCASE", 2),
+  CollationTestCase("", "something", "UTF8_BINARY_LCASE", 9),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UNICODE", 5),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UNICODE_CI", 0)
+  // scalastyle:on nonascii
+).foreach(c => checkEvaluation(prepareLevenshtein(c.s1, c.s2, 
c.collation), c.expectedResult))
+  }
+

Review Comment:
   my thoughts are: yes
   
   @dbatomic and myself will be finishing some refactoring PRs soon, and will 
likely request all PRs to comply with some new design before further reivew



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


mihailom-db commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1559297852


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##
@@ -2213,8 +2213,8 @@ case class Levenshtein(
   }
 
   override def inputTypes: Seq[AbstractDataType] = threshold match {
-case Some(_) => Seq(StringType, StringType, IntegerType)
-case _ => Seq(StringType, StringType)
+case Some(_) => Seq(StringTypeAnyCollation, StringTypeAnyCollation, 
IntegerType)
+case _ => Seq(StringTypeAnyCollation, StringTypeAnyCollation)

Review Comment:
   Maybe we should special case this in CollationTypeCasts. We do not want to 
have an expression on the place of third parameter with StringType and take it 
into account, as it will explicitly be casted to Integer, and technically it's 
collation should not be counted. Something similar to If expression that is 
special cased.  



##
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala:
##
@@ -89,6 +89,73 @@ class CollationStringExpressionsSuite
 testRepeat("UNICODE_CI", 3, "abc", 2)
   }
 
+  test("Levenshtein expressions with collation") {
+def prepareLevenshtein(
+left: String,
+right: String,
+collation: String): Levenshtein = {
+  val collationId = CollationFactory.collationNameToId(collation)
+  val leftLit = Literal.create(left, StringType(collationId))
+  val rightLit = Literal.create(right, StringType(collationId))
+  Levenshtein(leftLit, rightLit)
+}
+
+Seq(
+  // scalastyle:off nonascii
+  CollationTestCase("", "", "UTF8_BINARY", 0),
+  CollationTestCase("", "something", "UTF8_BINARY", 9),
+  CollationTestCase("a", "a", "UTF8_BINARY", 0),
+  CollationTestCase("a", "A", "UTF8_BINARY", 1),
+  CollationTestCase("a", "a", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("a", "A", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("bd", "ABc", "UTF8_BINARY_LCASE", 2),
+  CollationTestCase("Xü", "Ü", "UTF8_BINARY_LCASE", 1),
+  CollationTestCase("Xũ", "Üx", "UTF8_BINARY_LCASE", 2),
+  CollationTestCase("", "something", "UTF8_BINARY_LCASE", 9),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UTF8_BINARY_LCASE", 0),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UNICODE", 5),
+  CollationTestCase("sOmeThINg", "SOMETHING", "UNICODE_CI", 0)
+  // scalastyle:on nonascii
+).foreach(c => checkEvaluation(prepareLevenshtein(c.s1, c.s2, 
c.collation), c.expectedResult))
+  }
+

Review Comment:
   @cloud-fan Should we also add a test to CollationSuite Implicit casting to 
check if casting actually works for this expression?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-04-10 Thread via GitHub


nikolamand-db commented on PR #45963:
URL: https://github.com/apache/spark/pull/45963#issuecomment-2047209932

   Please review collation team @dbatomic @stefankandic @uros-db @mihailom-db 
@stevomitric.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org