[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2017-01-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16086


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90578571
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -236,6 +238,29 @@ class Dataset[T] private[sql](
   }
   }
 
+  val EAST_ASIAN_LANGS = Seq("ja", "vi", "kr", "zh")
+
+  private def unicodeWidth(str: String): Int = {
+val locale = Locale.getDefault()
+if (locale == null) {
+  throw new NullPointerException("locale is null")
+}
+val ambiguousLen = if 
(EAST_ASIAN_LANGS.contains(locale.getLanguage())) 2 else 1
--- End diff --

I can create the separate helper for the default width. A challenge is how 
we can decide the helper can be applied when we have got a string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90567848
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -236,6 +238,29 @@ class Dataset[T] private[sql](
   }
   }
 
+  val EAST_ASIAN_LANGS = Seq("ja", "vi", "kr", "zh")
+
+  private def unicodeWidth(str: String): Int = {
+val locale = Locale.getDefault()
+if (locale == null) {
+  throw new NullPointerException("locale is null")
+}
+val ambiguousLen = if 
(EAST_ASIAN_LANGS.contains(locale.getLanguage())) 2 else 1
+var len = 0
+for (i <- 0 until str.length) {
+  val codePoint = str.codePointAt(i)
+  val value = UCharacter.getIntPropertyValue(codePoint, 
UProperty.EAST_ASIAN_WIDTH)
+  len = len + (value match {
+case UCharacter.EastAsianWidth.NARROW | 
UCharacter.EastAsianWidth.NEUTRAL |
+ UCharacter.EastAsianWidth.HALFWIDTH => 1
--- End diff --

An indent issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90482959
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -236,6 +238,37 @@ class Dataset[T] private[sql](
   }
   }
 
+  var EAST_ASIAN_LANGS = Seq("ja", "vi", "kr", "zh")
+
+  private def unicodeWidth(str: String): Int = {
+val locale = Locale.getDefault()
+if (locale == null) {
+  throw new NullPointerException("locale is null")
+}
+val ambiguousLen = if 
(EAST_ASIAN_LANGS.contains(locale.getLanguage())) 2 else 1
+var len = 0
+for (i <- 0 until str.length) {
+  val codePoint = str.codePointAt(i)
+  val value = UCharacter.getIntPropertyValue(codePoint, 
UProperty.EAST_ASIAN_WIDTH);
+  len = len + (value match {
+case UCharacter.EastAsianWidth.NARROW | 
UCharacter.EastAsianWidth.NEUTRAL |
+  UCharacter.EastAsianWidth.HALFWIDTH => 1
--- End diff --

oh, updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90482897
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -236,6 +238,37 @@ class Dataset[T] private[sql](
   }
   }
 
+  var EAST_ASIAN_LANGS = Seq("ja", "vi", "kr", "zh")
+
+  private def unicodeWidth(str: String): Int = {
+val locale = Locale.getDefault()
+if (locale == null) {
+  throw new NullPointerException("locale is null")
+}
+val ambiguousLen = if 
(EAST_ASIAN_LANGS.contains(locale.getLanguage())) 2 else 1
+var len = 0
+for (i <- 0 until str.length) {
+  val codePoint = str.codePointAt(i)
+  val value = UCharacter.getIntPropertyValue(codePoint, 
UProperty.EAST_ASIAN_WIDTH);
--- End diff --

thanks, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90465622
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -275,36 +308,43 @@ class Dataset[T] private[sql](
 val numCols = schema.fieldNames.length
 
 // Initialise the width of each column to a minimum value of '3'
-val colWidths = Array.fill(numCols)(3)
+val colMaxWidths = Array.fill(numCols)(3)
+val colWidths = Array.ofDim[Int](rows.length, numCols)
 
 // Compute the width of each column
+var j = 0
 for (row <- rows) {
   for ((cell, i) <- row.zipWithIndex) {
-colWidths(i) = math.max(colWidths(i), cell.length)
+val width = unicodeWidth(cell)
+colWidths(j)(i) = width
+colMaxWidths(i) = math.max(colMaxWidths(i), width)
   }
+  j = j + 1
 }
 
 // Create SeparateLine
-val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
+val sep: String = colMaxWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
 
 // column names
 rows.head.zipWithIndex.map { case (cell, i) =>
   if (truncate > 0) {
-StringUtils.leftPad(cell, colWidths(i))
+repeatPadding(colMaxWidths(i) - colWidths(0)(i)).append(cell)
--- End diff --

Good catch. However, I would like to get the number of spaces with 
`StringBuilder`. It seems to return `String`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90452307
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -275,36 +308,43 @@ class Dataset[T] private[sql](
 val numCols = schema.fieldNames.length
 
 // Initialise the width of each column to a minimum value of '3'
-val colWidths = Array.fill(numCols)(3)
+val colMaxWidths = Array.fill(numCols)(3)
+val colWidths = Array.ofDim[Int](rows.length, numCols)
 
 // Compute the width of each column
+var j = 0
 for (row <- rows) {
   for ((cell, i) <- row.zipWithIndex) {
-colWidths(i) = math.max(colWidths(i), cell.length)
+val width = unicodeWidth(cell)
+colWidths(j)(i) = width
+colMaxWidths(i) = math.max(colMaxWidths(i), width)
   }
+  j = j + 1
 }
 
 // Create SeparateLine
-val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
+val sep: String = colMaxWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
 
 // column names
 rows.head.zipWithIndex.map { case (cell, i) =>
   if (truncate > 0) {
-StringUtils.leftPad(cell, colWidths(i))
+repeatPadding(colMaxWidths(i) - colWidths(0)(i)).append(cell)
--- End diff --

oh. Got it.

For this purpose, current `repeatPadding` looks verbose. If you just want 
to create exact number of spaces, you can use `" " * n`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90451485
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -275,36 +308,43 @@ class Dataset[T] private[sql](
 val numCols = schema.fieldNames.length
 
 // Initialise the width of each column to a minimum value of '3'
-val colWidths = Array.fill(numCols)(3)
+val colMaxWidths = Array.fill(numCols)(3)
+val colWidths = Array.ofDim[Int](rows.length, numCols)
 
 // Compute the width of each column
+var j = 0
 for (row <- rows) {
   for ((cell, i) <- row.zipWithIndex) {
-colWidths(i) = math.max(colWidths(i), cell.length)
+val width = unicodeWidth(cell)
+colWidths(j)(i) = width
+colMaxWidths(i) = math.max(colMaxWidths(i), width)
   }
+  j = j + 1
 }
 
 // Create SeparateLine
-val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
+val sep: String = colMaxWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
 
 // column names
 rows.head.zipWithIndex.map { case (cell, i) =>
   if (truncate > 0) {
-StringUtils.leftPad(cell, colWidths(i))
+repeatPadding(colMaxWidths(i) - colWidths(0)(i)).append(cell)
--- End diff --

`StringUtils.leftPad/rightPad` uses String.length. Since this usage causes 
the same problem, the new code does not use these methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90451294
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -275,36 +308,43 @@ class Dataset[T] private[sql](
 val numCols = schema.fieldNames.length
 
 // Initialise the width of each column to a minimum value of '3'
-val colWidths = Array.fill(numCols)(3)
+val colMaxWidths = Array.fill(numCols)(3)
+val colWidths = Array.ofDim[Int](rows.length, numCols)
 
 // Compute the width of each column
+var j = 0
 for (row <- rows) {
   for ((cell, i) <- row.zipWithIndex) {
-colWidths(i) = math.max(colWidths(i), cell.length)
+val width = unicodeWidth(cell)
+colWidths(j)(i) = width
+colMaxWidths(i) = math.max(colMaxWidths(i), width)
   }
+  j = j + 1
 }
 
 // Create SeparateLine
-val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
+val sep: String = colMaxWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
 
 // column names
 rows.head.zipWithIndex.map { case (cell, i) =>
   if (truncate > 0) {
-StringUtils.leftPad(cell, colWidths(i))
+repeatPadding(colMaxWidths(i) - colWidths(0)(i)).append(cell)
--- End diff --

Any reason we replace `StringUtils.leftPad/rightPad` with `repeatPadding`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90449169
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -236,6 +238,37 @@ class Dataset[T] private[sql](
   }
   }
 
+  var EAST_ASIAN_LANGS = Seq("ja", "vi", "kr", "zh")
+
+  private def unicodeWidth(str: String): Int = {
+val locale = Locale.getDefault()
+if (locale == null) {
+  throw new NullPointerException("locale is null")
+}
+val ambiguousLen = if 
(EAST_ASIAN_LANGS.contains(locale.getLanguage())) 2 else 1
+var len = 0
+for (i <- 0 until str.length) {
+  val codePoint = str.codePointAt(i)
+  val value = UCharacter.getIntPropertyValue(codePoint, 
UProperty.EAST_ASIAN_WIDTH);
+  len = len + (value match {
+case UCharacter.EastAsianWidth.NARROW | 
UCharacter.EastAsianWidth.NEUTRAL |
+  UCharacter.EastAsianWidth.HALFWIDTH => 1
--- End diff --

ident looks weird.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90448773
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -236,6 +238,37 @@ class Dataset[T] private[sql](
   }
   }
 
+  var EAST_ASIAN_LANGS = Seq("ja", "vi", "kr", "zh")
--- End diff --

why var?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90415866
  
--- Diff: sql/core/pom.xml ---
@@ -92,6 +92,11 @@
   ${fasterxml.jackson.version}
 
 
+  com.ibm.icu
--- End diff --

Sure.
a) I think there is no limitation in the 
[licence](http://www.unicode.org/copyright.html#License)
b) I cannot find this jar in the current [transitive 
dependency](https://gist.github.com/kiszk/4fb18b04e1cfc8e6673a2954369fe52d)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16086: [SPARK-18653][SQL] Fix incorrect space padding fo...

2016-12-01 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16086#discussion_r90402700
  
--- Diff: sql/core/pom.xml ---
@@ -92,6 +92,11 @@
   ${fasterxml.jackson.version}
 
 
+  com.ibm.icu
--- End diff --

I'm not against it, but I'm a little hesitant to bring in all this weight 
to fix a basically cosmetic problem. This may already be included transitively 
though. WOrth checking the a) license of this library and b) whether it's 
already in use in the transitive dependencies? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org