Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r211105897
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,24 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    val regex = """[^\x00-\u2e39]""".r
    --- End diff --
    
    This could use a comment and slightly better name for the variable? I also 
wonder if a regex is a little bit slow for scanning every character.
    
    However it's not clear this definition is accurate enough. According to 
things like 
https://stackoverflow.com/questions/13505075/analyzing-full-width-or-half-width-character-in-java
 we're really looking for the concept of "fullwidth" East Asian characters. The 
answer there provides a somewhat more precise definition, though others say you 
need something like icu4j for a proper definition. 
    
    Maybe at least adopt the alternative proposed in the SO answer? It's not 
necessary to be perfect here, as it's a cosmetic issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to