[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240110126
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

Thanks. It doesn't matter which company it is. All the PRs are equally and 
reasonably reviewed, and merged per the same criteria.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240107064
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
--- End diff --

Ah, it's `:+` not `+:`. Yea, it's confusing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread maomaoChibei
Github user maomaoChibei commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240107003
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

thanks HyukjinKwon for the kind explaination. we are from an alibaba 
subsidiary internet company working on resolving real time data calculation 
based on Peta level data. Currently we have an real time engine which is built 
on this spark-sql, well, at lease from the sql engine point of view. The 
throughput is over an qps of 1k and with response latency less than 100ms. Its 
an none-stop online 24*7 platform serving over ten million active users. If 
this market scale and platform scale meets the criteria, let me know, we are 
searching for further cooporations. thanks again. maomao



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240105070
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
--- End diff --

? output is different, no?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen
Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240103941
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
--- End diff --

Just a idiomatic way to write Scala


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240101927
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

Right, but I don't think it's more readable. It's non-critical path so 
performance concern is secondary anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen
Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240098806
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

For comprehension is just a syntax sugar and is not performant at all. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen
Github user sadhen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240098620
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

As for performance:

Current Version:
```
[info] HiveShimSuite:
[info] - appendReadColumns (549 milliseconds)
```

Previous Version:
```
[info] HiveShimSuite:
[info] - appendReadColumns (949 milliseconds)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240097931
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
--- End diff --

Also, looks the output would be different after this change.
Looks it was `old + col` but the current changes looks doing `col + old`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23268#discussion_r240097105
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
---
@@ -53,19 +53,12 @@ private[hive] object HiveShim {
* This function in hive-0.13 become private, but we have to do this to 
work around hive bug
*/
   private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
-val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
-val result: StringBuilder = new StringBuilder(old)
-var first: Boolean = old.isEmpty
-
-for (col <- cols) {
-  if (first) {
-first = false
-  } else {
-result.append(',')
-  }
-  result.append(col)
-}
-conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR
+val value = Option(conf.get(key, null))
+  .map(old => cols.+:(old))
+  .getOrElse(cols)
+  .mkString(",")
--- End diff --

I don't think this is more readable. Also, the previous code is more 
performant. I wouldn't change this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...

2018-12-09 Thread sadhen
GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/23268

[Hive][Minor] Refactor on HiveShim and Add Unit Tests

## What changes were proposed in this pull request?

Refactor on HiveShim, and add Unit Tests.

## How was this patch tested?
```
$ build/sbt
> project hive
> testOnly *HiveShimSuite
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark refactor/hiveshim

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23268


commit 987fa18a3b50e117fc839201fcc29c166732486e
Author: Darcy Shen 
Date:   2018-12-10T06:32:35Z

Add unit tests for HiveShim appendReadColumns

commit b68e7b1421f977c7256573564b12b4cc07d31f4a
Author: Darcy Shen 
Date:   2018-12-10T06:38:19Z

Refactor




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org