Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22048#discussion_r211105897
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -294,23 +294,24 @@ class Dataset[T] private[sql](
// We set a minimum column width at '3'
val minimumColWidth = 3
+ val regex = """[^\x00-\u2e39]""".r
--- End diff --
This could use a comment and slightly better name for the variable? I also
wonder if a regex is a little bit slow for scanning every character.
However it's not clear this definition is accurate enough. According to
things like
https://stackoverflow.com/questions/13505075/analyzing-full-width-or-half-width-character-in-java
we're really looking for the concept of "fullwidth" East Asian characters. The
answer there provides a somewhat more precise definition, though others say you
need something like icu4j for a proper definition.
Maybe at least adopt the alternative proposed in the SO answer? It's not
necessary to be perfect here, as it's a cosmetic issue.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]