stefankandic commented on code in PR #45963:
URL: https://github.com/apache/spark/pull/45963#discussion_r1559333644


##########
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##########
@@ -1509,12 +1509,66 @@ public boolean semanticEquals(final UTF8String other, 
int collationId) {
     return 
CollationFactory.fetchCollation(collationId).equalsFunction.apply(this, other);
   }
 
+  private interface SubstringEquals {
+    boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+      int lenLeft, int lenRight);
+  }
+
+  private static class ByteSubstringEquals implements SubstringEquals {
+    @Override
+    public boolean equals(UTF8String left, UTF8String right, int posLeft, int 
posRight,
+      int lenLeft, int lenRight) {
+      if (lenLeft != lenRight || left.getByte(posLeft) != 
right.getByte(posRight)) {
+        return false;
+      }
+      return (ByteArrayMethods.arrayEquals(left.base, left.offset + posLeft, 
right.base,
+        right.offset + posRight, lenLeft));
+    }
+  }
+
+  private static final ByteSubstringEquals byteSubstringEquals = new 
ByteSubstringEquals();
+
+  private static class CollationSubstringEquals implements SubstringEquals {

Review Comment:
   have you actually tested the impact of having a smaller number of 
allocations? Because creating a UTF8 string might not be that expensive, 
especially compared to converting from UTF8String to String which will happen 
anyway



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to