[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23268#discussion_r240110126 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala --- @@ -53,19 +53,12 @@ private[hive] object HiveShim { * This function in hive-0.13 become private, but we have to do this to work around hive bug */ private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { -val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "") -val result: StringBuilder = new StringBuilder(old) -var first: Boolean = old.isEmpty - -for (col <- cols) { - if (first) { -first = false - } else { -result.append(',') - } - result.append(col) -} -conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) +val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR +val value = Option(conf.get(key, null)) + .map(old => cols.+:(old)) + .getOrElse(cols) + .mkString(",") --- End diff -- Thanks. It doesn't matter which company it is. All the PRs are equally and reasonably reviewed, and merged per the same criteria. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23268#discussion_r240107064 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala --- @@ -53,19 +53,12 @@ private[hive] object HiveShim { * This function in hive-0.13 become private, but we have to do this to work around hive bug */ private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { -val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "") -val result: StringBuilder = new StringBuilder(old) -var first: Boolean = old.isEmpty - -for (col <- cols) { - if (first) { -first = false - } else { -result.append(',') - } - result.append(col) -} -conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) +val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR +val value = Option(conf.get(key, null)) + .map(old => cols.+:(old)) --- End diff -- Ah, it's `:+` not `+:`. Yea, it's confusing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
Github user maomaoChibei commented on a diff in the pull request: https://github.com/apache/spark/pull/23268#discussion_r240107003 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala --- @@ -53,19 +53,12 @@ private[hive] object HiveShim { * This function in hive-0.13 become private, but we have to do this to work around hive bug */ private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { -val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "") -val result: StringBuilder = new StringBuilder(old) -var first: Boolean = old.isEmpty - -for (col <- cols) { - if (first) { -first = false - } else { -result.append(',') - } - result.append(col) -} -conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) +val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR +val value = Option(conf.get(key, null)) + .map(old => cols.+:(old)) + .getOrElse(cols) + .mkString(",") --- End diff -- thanks HyukjinKwon for the kind explaination. we are from an alibaba subsidiary internet company working on resolving real time data calculation based on Peta level data. Currently we have an real time engine which is built on this spark-sql, well, at lease from the sql engine point of view. The throughput is over an qps of 1k and with response latency less than 100ms. Its an none-stop online 24*7 platform serving over ten million active users. If this market scale and platform scale meets the criteria, let me know, we are searching for further cooporations. thanks again. maomao --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23268#discussion_r240105070 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala --- @@ -53,19 +53,12 @@ private[hive] object HiveShim { * This function in hive-0.13 become private, but we have to do this to work around hive bug */ private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { -val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "") -val result: StringBuilder = new StringBuilder(old) -var first: Boolean = old.isEmpty - -for (col <- cols) { - if (first) { -first = false - } else { -result.append(',') - } - result.append(col) -} -conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) +val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR +val value = Option(conf.get(key, null)) + .map(old => cols.+:(old)) --- End diff -- ? output is different, no? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
Github user sadhen commented on a diff in the pull request: https://github.com/apache/spark/pull/23268#discussion_r240103941 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala --- @@ -53,19 +53,12 @@ private[hive] object HiveShim { * This function in hive-0.13 become private, but we have to do this to work around hive bug */ private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { -val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "") -val result: StringBuilder = new StringBuilder(old) -var first: Boolean = old.isEmpty - -for (col <- cols) { - if (first) { -first = false - } else { -result.append(',') - } - result.append(col) -} -conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) +val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR +val value = Option(conf.get(key, null)) + .map(old => cols.+:(old)) --- End diff -- Just a idiomatic way to write Scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23268#discussion_r240101927 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala --- @@ -53,19 +53,12 @@ private[hive] object HiveShim { * This function in hive-0.13 become private, but we have to do this to work around hive bug */ private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { -val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "") -val result: StringBuilder = new StringBuilder(old) -var first: Boolean = old.isEmpty - -for (col <- cols) { - if (first) { -first = false - } else { -result.append(',') - } - result.append(col) -} -conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) +val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR +val value = Option(conf.get(key, null)) + .map(old => cols.+:(old)) + .getOrElse(cols) + .mkString(",") --- End diff -- Right, but I don't think it's more readable. It's non-critical path so performance concern is secondary anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
Github user sadhen commented on a diff in the pull request: https://github.com/apache/spark/pull/23268#discussion_r240098806 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala --- @@ -53,19 +53,12 @@ private[hive] object HiveShim { * This function in hive-0.13 become private, but we have to do this to work around hive bug */ private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { -val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "") -val result: StringBuilder = new StringBuilder(old) -var first: Boolean = old.isEmpty - -for (col <- cols) { - if (first) { -first = false - } else { -result.append(',') - } - result.append(col) -} -conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) +val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR +val value = Option(conf.get(key, null)) + .map(old => cols.+:(old)) + .getOrElse(cols) + .mkString(",") --- End diff -- For comprehension is just a syntax sugar and is not performant at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
Github user sadhen commented on a diff in the pull request: https://github.com/apache/spark/pull/23268#discussion_r240098620 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala --- @@ -53,19 +53,12 @@ private[hive] object HiveShim { * This function in hive-0.13 become private, but we have to do this to work around hive bug */ private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { -val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "") -val result: StringBuilder = new StringBuilder(old) -var first: Boolean = old.isEmpty - -for (col <- cols) { - if (first) { -first = false - } else { -result.append(',') - } - result.append(col) -} -conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) +val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR +val value = Option(conf.get(key, null)) + .map(old => cols.+:(old)) + .getOrElse(cols) + .mkString(",") --- End diff -- As for performance: Current Version: ``` [info] HiveShimSuite: [info] - appendReadColumns (549 milliseconds) ``` Previous Version: ``` [info] HiveShimSuite: [info] - appendReadColumns (949 milliseconds) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23268#discussion_r240097931 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala --- @@ -53,19 +53,12 @@ private[hive] object HiveShim { * This function in hive-0.13 become private, but we have to do this to work around hive bug */ private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { -val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "") -val result: StringBuilder = new StringBuilder(old) -var first: Boolean = old.isEmpty - -for (col <- cols) { - if (first) { -first = false - } else { -result.append(',') - } - result.append(col) -} -conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) +val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR +val value = Option(conf.get(key, null)) + .map(old => cols.+:(old)) --- End diff -- Also, looks the output would be different after this change. Looks it was `old + col` but the current changes looks doing `col + old` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23268#discussion_r240097105 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala --- @@ -53,19 +53,12 @@ private[hive] object HiveShim { * This function in hive-0.13 become private, but we have to do this to work around hive bug */ private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { -val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "") -val result: StringBuilder = new StringBuilder(old) -var first: Boolean = old.isEmpty - -for (col <- cols) { - if (first) { -first = false - } else { -result.append(',') - } - result.append(col) -} -conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) +val key = ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR +val value = Option(conf.get(key, null)) + .map(old => cols.+:(old)) + .getOrElse(cols) + .mkString(",") --- End diff -- I don't think this is more readable. Also, the previous code is more performant. I wouldn't change this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23268: [Hive][Minor] Refactor on HiveShim and Add Unit T...
GitHub user sadhen opened a pull request: https://github.com/apache/spark/pull/23268 [Hive][Minor] Refactor on HiveShim and Add Unit Tests ## What changes were proposed in this pull request? Refactor on HiveShim, and add Unit Tests. ## How was this patch tested? ``` $ build/sbt > project hive > testOnly *HiveShimSuite ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/sadhen/spark refactor/hiveshim Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23268.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23268 commit 987fa18a3b50e117fc839201fcc29c166732486e Author: Darcy Shen Date: 2018-12-10T06:32:35Z Add unit tests for HiveShim appendReadColumns commit b68e7b1421f977c7256573564b12b4cc07d31f4a Author: Darcy Shen Date: 2018-12-10T06:38:19Z Refactor --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org