Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22975#discussion_r231977392
--- Diff:
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -411,7 +412,7 @@ public UTF8String toUpperCase() {
}
private UTF8String toUpperCaseSlow() {
- return fromString(toString().toUpperCase());
+ return fromString(toString().toUpperCase(Locale.ROOT));
--- End diff --
I think we explicitly didn't change this on purpose; the point of fixing
Locale.ROOT is to make sure that strings that aren't really user data that
could well be locale-dependent don't vary. For example internal identifiers for
compression types or impurity types.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]