Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22975#discussion_r231977392 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -411,7 +412,7 @@ public UTF8String toUpperCase() { } private UTF8String toUpperCaseSlow() { - return fromString(toString().toUpperCase()); + return fromString(toString().toUpperCase(Locale.ROOT)); --- End diff -- I think we explicitly didn't change this on purpose; the point of fixing Locale.ROOT is to make sure that strings that aren't really user data that could well be locale-dependent don't vary. For example internal identifiers for compression types or impurity types.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org