subject:"spark git commit\: \[SPARK\-25108\]\[SQL\] Fix the show method to display the wide character alignment problem"

spark git commit: [SPARK-25108][SQL] Fix the show method to display the wide character alignment problem

2018-09-06 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 3682d29f4 -> a7cfe5158


[SPARK-25108][SQL] Fix the show method to display the wide character alignment 
problem

This is not a perfect solution. It is designed to minimize complexity on the 
basis of solving problems.

It is effective for English, Chinese characters, Japanese, Korean and so on.

```scala
before:
+---+---+-+
|id |ä¸å½ |s2   |
+---+---+-+
|1  |ab |[a]  |
|2  |null   |[ä¸å½, abc]|
|3  |ab1|[hello world]|
|4  |ãè¡ ãã(kya) ãã(kyu) ãã(kyo) |[âä¸å½]|
|5  |ä¸å½ï¼ä½ å¥½ï¼a|[âä¸ï¼å½ï¼, 312] |
|6  |ä¸å½å±±(ä¸)æå¡åº  |[âä¸(å½ï¼]  |
|7  |ä¸å½å±±ä¸æå¡åº|[ä¸(å½)]   |
|8  |   |[ä¸å½] |
+---+---+-+

after:
+---+---++
|id |ä¸å½   |s2  |
+---+---++
|1  |ab |[a] |
|2  |null   |[ä¸å½, abc] |
|3  |ab1|[hello world]   |
|4  |ãè¡ ãã(kya) ãã(kyu) ãã(kyo) |[âä¸å½] |
|5  |ä¸å½ï¼ä½ å¥½ï¼a  |[âä¸ï¼å½ï¼, 312]|
|6  |ä¸å½å±±(ä¸)æå¡åº   |[âä¸(å½ï¼]  |
|7  |ä¸å½å±±ä¸æå¡åº |[ä¸(å½)]|
|8  |   |[ä¸å½]  |
+---+---++
```

## What changes were proposed in this pull request?

When there are wide characters such as Chinese characters or Japanese 
characters in the data, the show method has a alignment problem.
Try to fix this problem.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)

![image](https://user-images.githubusercontent.com/13044869/44250564-69f6b400-a227-11e8-88b2-6cf6960377ff.png)

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Closes #22048 from xuejianbest/master.

Authored-by: xuejianbest <384329...@qq.com>
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7cfe515
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7cfe515
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7cfe515

Branch: refs/heads/branch-2.4
Commit: a7cfe5158f5c25ae5f774e1fb45d63a67a4bb89c
Parents: 3682d29
Author: xuejianbest <384329...@qq.com>
Authored: Thu Sep 6 07:17:37 2018 -0700
Committer: Sean Owen 
Committed: Thu Sep 6 10:48:22 2018 -0700

--
 .../scala/org/apache/spark/util/Utils.scala | 30 
 .../org/apache/spark/util/UtilsSuite.scala  | 21 +
 .../scala/org/apache/spark/sql/Dataset.scala| 18 +++
 .../org/apache/spark/sql/DatasetSuite.scala | 49 
 4 files changed, 109 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a7cfe515/core/src/main/scala/org/apache/spark/util/Utils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 15c958d..4593b05 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -2795,6 +2795,36 @@ private[spark] object Utils extends Logging {
   }
 }
   }
+
+  /**
+   * Regular expression matching full width characters.
+   *
+   * Looked at all the 0x-0x characters (unicode) and showed them 
under Xshell.
+   * Found all the full width characters, then get the regular expression.
+   */
+  private val fullWidthRegex = ("""[""" +
+// scalastyle:off nonascii
+"""\u1100-\u115F""" +
+"""\u2E80-\uA4CF""" +
+"""\uAC00-\uD7A3""" +
+"""\uF900-\uFAFF""" +
+"""\uFE10-\uFE19""" +
+"""\uFE30-\uFE6F""" +
+"""\uFF00-\uFF60""" +
+"""\uFFE0-\uFFE6""" +
+// scalastyle:on nonascii
+"""]""").r
+
+  /**
+   * Return the number of half widths in a given string. Note that a full 
width character
+   * occupies two half widths.
+   *
+   * For a string consisting of 1 million characters, the execution of this 
method requires
+   * about 50ms.
+   */
+  def stringHalfWidth(str: String): Int = {
+if (str == null) 0 else str.length + fullWidthRegex.findAllIn(str).size
+  }
 }
 
 private[util] object CallerContext extends

spark git commit: [SPARK-25108][SQL] Fix the show method to display the wide character alignment problem

2018-09-06 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 64c314e22 -> f5817d8bb


[SPARK-25108][SQL] Fix the show method to display the wide character alignment 
problem

This is not a perfect solution. It is designed to minimize complexity on the 
basis of solving problems.

It is effective for English, Chinese characters, Japanese, Korean and so on.

```scala
before:
+---+---+-+
|id |ä¸å½ |s2   |
+---+---+-+
|1  |ab |[a]  |
|2  |null   |[ä¸å½, abc]|
|3  |ab1|[hello world]|
|4  |ãè¡ ãã(kya) ãã(kyu) ãã(kyo) |[âä¸å½]|
|5  |ä¸å½ï¼ä½ å¥½ï¼a|[âä¸ï¼å½ï¼, 312] |
|6  |ä¸å½å±±(ä¸)æå¡åº  |[âä¸(å½ï¼]  |
|7  |ä¸å½å±±ä¸æå¡åº|[ä¸(å½)]   |
|8  |   |[ä¸å½] |
+---+---+-+

after:
+---+---++
|id |ä¸å½   |s2  |
+---+---++
|1  |ab |[a] |
|2  |null   |[ä¸å½, abc] |
|3  |ab1|[hello world]   |
|4  |ãè¡ ãã(kya) ãã(kyu) ãã(kyo) |[âä¸å½] |
|5  |ä¸å½ï¼ä½ å¥½ï¼a  |[âä¸ï¼å½ï¼, 312]|
|6  |ä¸å½å±±(ä¸)æå¡åº   |[âä¸(å½ï¼]  |
|7  |ä¸å½å±±ä¸æå¡åº |[ä¸(å½)]|
|8  |   |[ä¸å½]  |
+---+---++
```

## What changes were proposed in this pull request?

When there are wide characters such as Chinese characters or Japanese 
characters in the data, the show method has a alignment problem.
Try to fix this problem.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)

![image](https://user-images.githubusercontent.com/13044869/44250564-69f6b400-a227-11e8-88b2-6cf6960377ff.png)

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Closes #22048 from xuejianbest/master.

Authored-by: xuejianbest <384329...@qq.com>
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f5817d8b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f5817d8b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f5817d8b

Branch: refs/heads/master
Commit: f5817d8bb33b733eeca0154d1ed207c8d1e8513f
Parents: 64c314e
Author: xuejianbest <384329...@qq.com>
Authored: Thu Sep 6 07:17:37 2018 -0700
Committer: Sean Owen 
Committed: Thu Sep 6 07:17:37 2018 -0700

--
 .../scala/org/apache/spark/util/Utils.scala | 30 
 .../org/apache/spark/util/UtilsSuite.scala  | 21 +
 .../scala/org/apache/spark/sql/Dataset.scala| 18 +++
 .../org/apache/spark/sql/DatasetSuite.scala | 49 
 4 files changed, 109 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f5817d8b/core/src/main/scala/org/apache/spark/util/Utils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 15c958d..4593b05 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -2795,6 +2795,36 @@ private[spark] object Utils extends Logging {
   }
 }
   }
+
+  /**
+   * Regular expression matching full width characters.
+   *
+   * Looked at all the 0x-0x characters (unicode) and showed them 
under Xshell.
+   * Found all the full width characters, then get the regular expression.
+   */
+  private val fullWidthRegex = ("""[""" +
+// scalastyle:off nonascii
+"""\u1100-\u115F""" +
+"""\u2E80-\uA4CF""" +
+"""\uAC00-\uD7A3""" +
+"""\uF900-\uFAFF""" +
+"""\uFE10-\uFE19""" +
+"""\uFE30-\uFE6F""" +
+"""\uFF00-\uFF60""" +
+"""\uFFE0-\uFFE6""" +
+// scalastyle:on nonascii
+"""]""").r
+
+  /**
+   * Return the number of half widths in a given string. Note that a full 
width character
+   * occupies two half widths.
+   *
+   * For a string consisting of 1 million characters, the execution of this 
method requires
+   * about 50ms.
+   */
+  def stringHalfWidth(str: String): Int = {
+if (str == null) 0 else str.length + fullWidthRegex.findAllIn(str).size
+  }
 }
 
 private[util] object CallerContext extends Logging

spark git commit: [SPARK-25108][SQL] Fix the show method to display the wide character alignment problem

spark git commit: [SPARK-25108][SQL] Fix the show method to display the wide character alignment problem

2 matches

Site Navigation

Mail list logo

Footer information