dbatomic commented on code in PR #44968:
URL: https://github.com/apache/spark/pull/44968#discussion_r1474593790
##########
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java:
##########
@@ -1388,28 +1388,43 @@ public UTF8String copy() {
return fromBytes(bytes);
}
+ /**
+ * Collation aware comparison between two UTF8 strings.
Review Comment:
Sure, I will remove this part from PR.
Also, if we decide to keep UTF8String as pure data class, I think that we
should also remove `Comparable<UTF8String>` interface from it's declaration,
given that these methods will no longer be valid.
Also, note that there are other functions that will have to find a new place
(e.g. `contains`, `matchAt`, `startsWith`, `endsWith`, `findInSet`, `indexOf`,
`find`, `rfind`, `split`, `replace`) since all of them should work with both
data and collation.
i.e. pretty much anything that represents a relation between two or more
UTF8String objects can't live anymore in UTF8String class if we don't push
collationInfo into UTF8String.
Anyhow, let's talk about this tomorrow. For now I will remove any changes to
the UTF8String.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]