spark git commit: [SPARK-25108][SQL] Fix the show method to display the wide character alignment problem
Repository: spark Updated Branches: refs/heads/branch-2.4 3682d29f4 -> a7cfe5158 [SPARK-25108][SQL] Fix the show method to display the wide character alignment problem This is not a perfect solution. It is designed to minimize complexity on the basis of solving problems. It is effective for English, Chinese characters, Japanese, Korean and so on. ```scala before: +---+---+-+ |id |ä¸å½ |s2 | +---+---+-+ |1 |ab |[a] | |2 |null |[ä¸å½, abc]| |3 |ab1|[hello world]| |4 |ãè¡ ãã(kya) ãã (kyu) ãã(kyo) |[âä¸å½]| |5 |ä¸å½ï¼ä½ 好ï¼a|[âä¸ï¼å½ï¼, 312] | |6 |ä¸å½å±±(ä¸)æå¡åº |[âä¸(å½ï¼] | |7 |ä¸å½å±±ä¸æå¡åº|[ä¸(å½)] | |8 | |[ä¸å½] | +---+---+-+ after: +---+---++ |id |ä¸å½ |s2 | +---+---++ |1 |ab |[a] | |2 |null |[ä¸å½, abc] | |3 |ab1|[hello world] | |4 |ãè¡ ãã(kya) ãã (kyu) ãã(kyo) |[âä¸å½] | |5 |ä¸å½ï¼ä½ 好ï¼a |[âä¸ï¼å½ï¼, 312]| |6 |ä¸å½å±±(ä¸)æå¡åº |[âä¸(å½ï¼] | |7 |ä¸å½å±±ä¸æå¡åº |[ä¸(å½)]| |8 | |[ä¸å½] | +---+---++ ``` ## What changes were proposed in this pull request? When there are wide characters such as Chinese characters or Japanese characters in the data, the show method has a alignment problem. Try to fix this problem. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) ![image](https://user-images.githubusercontent.com/13044869/44250564-69f6b400-a227-11e8-88b2-6cf6960377ff.png) Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #22048 from xuejianbest/master. Authored-by: xuejianbest <384329...@qq.com> Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7cfe515 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7cfe515 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7cfe515 Branch: refs/heads/branch-2.4 Commit: a7cfe5158f5c25ae5f774e1fb45d63a67a4bb89c Parents: 3682d29 Author: xuejianbest <384329...@qq.com> Authored: Thu Sep 6 07:17:37 2018 -0700 Committer: Sean Owen Committed: Thu Sep 6 10:48:22 2018 -0700 -- .../scala/org/apache/spark/util/Utils.scala | 30 .../org/apache/spark/util/UtilsSuite.scala | 21 + .../scala/org/apache/spark/sql/Dataset.scala| 18 +++ .../org/apache/spark/sql/DatasetSuite.scala | 49 4 files changed, 109 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a7cfe515/core/src/main/scala/org/apache/spark/util/Utils.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 15c958d..4593b05 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -2795,6 +2795,36 @@ private[spark] object Utils extends Logging { } } } + + /** + * Regular expression matching full width characters. + * + * Looked at all the 0x-0x characters (unicode) and showed them under Xshell. + * Found all the full width characters, then get the regular expression. + */ + private val fullWidthRegex = ("""[""" + +// scalastyle:off nonascii +"""\u1100-\u115F""" + +"""\u2E80-\uA4CF""" + +"""\uAC00-\uD7A3""" + +"""\uF900-\uFAFF""" + +"""\uFE10-\uFE19""" + +"""\uFE30-\uFE6F""" + +"""\uFF00-\uFF60""" + +"""\uFFE0-\uFFE6""" + +// scalastyle:on nonascii +"""]""").r + + /** + * Return the number of half widths in a given string. Note that a full width character + * occupies two half widths. + * + * For a string consisting of 1 million characters, the execution of this method requires + * about 50ms. + */ + def stringHalfWidth(str: String): Int = { +if (str == null) 0 else str.length + fullWidthRegex.findAllIn(str).size + } } private[util] object CallerContext extends
spark git commit: [SPARK-25108][SQL] Fix the show method to display the wide character alignment problem
Repository: spark Updated Branches: refs/heads/master 64c314e22 -> f5817d8bb [SPARK-25108][SQL] Fix the show method to display the wide character alignment problem This is not a perfect solution. It is designed to minimize complexity on the basis of solving problems. It is effective for English, Chinese characters, Japanese, Korean and so on. ```scala before: +---+---+-+ |id |ä¸å½ |s2 | +---+---+-+ |1 |ab |[a] | |2 |null |[ä¸å½, abc]| |3 |ab1|[hello world]| |4 |ãè¡ ãã(kya) ãã (kyu) ãã(kyo) |[âä¸å½]| |5 |ä¸å½ï¼ä½ 好ï¼a|[âä¸ï¼å½ï¼, 312] | |6 |ä¸å½å±±(ä¸)æå¡åº |[âä¸(å½ï¼] | |7 |ä¸å½å±±ä¸æå¡åº|[ä¸(å½)] | |8 | |[ä¸å½] | +---+---+-+ after: +---+---++ |id |ä¸å½ |s2 | +---+---++ |1 |ab |[a] | |2 |null |[ä¸å½, abc] | |3 |ab1|[hello world] | |4 |ãè¡ ãã(kya) ãã (kyu) ãã(kyo) |[âä¸å½] | |5 |ä¸å½ï¼ä½ 好ï¼a |[âä¸ï¼å½ï¼, 312]| |6 |ä¸å½å±±(ä¸)æå¡åº |[âä¸(å½ï¼] | |7 |ä¸å½å±±ä¸æå¡åº |[ä¸(å½)]| |8 | |[ä¸å½] | +---+---++ ``` ## What changes were proposed in this pull request? When there are wide characters such as Chinese characters or Japanese characters in the data, the show method has a alignment problem. Try to fix this problem. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) ![image](https://user-images.githubusercontent.com/13044869/44250564-69f6b400-a227-11e8-88b2-6cf6960377ff.png) Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #22048 from xuejianbest/master. Authored-by: xuejianbest <384329...@qq.com> Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f5817d8b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f5817d8b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f5817d8b Branch: refs/heads/master Commit: f5817d8bb33b733eeca0154d1ed207c8d1e8513f Parents: 64c314e Author: xuejianbest <384329...@qq.com> Authored: Thu Sep 6 07:17:37 2018 -0700 Committer: Sean Owen Committed: Thu Sep 6 07:17:37 2018 -0700 -- .../scala/org/apache/spark/util/Utils.scala | 30 .../org/apache/spark/util/UtilsSuite.scala | 21 + .../scala/org/apache/spark/sql/Dataset.scala| 18 +++ .../org/apache/spark/sql/DatasetSuite.scala | 49 4 files changed, 109 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f5817d8b/core/src/main/scala/org/apache/spark/util/Utils.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 15c958d..4593b05 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -2795,6 +2795,36 @@ private[spark] object Utils extends Logging { } } } + + /** + * Regular expression matching full width characters. + * + * Looked at all the 0x-0x characters (unicode) and showed them under Xshell. + * Found all the full width characters, then get the regular expression. + */ + private val fullWidthRegex = ("""[""" + +// scalastyle:off nonascii +"""\u1100-\u115F""" + +"""\u2E80-\uA4CF""" + +"""\uAC00-\uD7A3""" + +"""\uF900-\uFAFF""" + +"""\uFE10-\uFE19""" + +"""\uFE30-\uFE6F""" + +"""\uFF00-\uFF60""" + +"""\uFFE0-\uFFE6""" + +// scalastyle:on nonascii +"""]""").r + + /** + * Return the number of half widths in a given string. Note that a full width character + * occupies two half widths. + * + * For a string consisting of 1 million characters, the execution of this method requires + * about 50ms. + */ + def stringHalfWidth(str: String): Int = { +if (str == null) 0 else str.length + fullWidthRegex.findAllIn(str).size + } } private[util] object CallerContext extends Logging