Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213525240
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,25 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    //Regular expression matching full width characters
    +    val fullWidthRegex = 
"""[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
    --- End diff --
    
    I generated 1000 strings, each consisting of 1000 characters with a random 
unicode of 0x0000-0xFFFF. (a total of 1 million characters.)
    Then use this regular expression to find the full width character of these 
strings. 
    I tested 100 rounds and then averaged.
    It takes 49 milliseconds to complete matching all 1000 strings.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to