[GitHub] spark issue #19607: [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19607 **[Test build #83205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83205/testReport)** for PR 19607 at commit [`5c08ecf`](https://github.com/apache/spark/commit/5c08ecf247bfe7e14afcdef8eba1c25cb3b68634). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19607: [SPARK-22395][SQL][PYTHON] Fix the behavior of ti...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19607 [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone ## What changes were proposed in this pull request? When converting Pandas DataFrame/Series from/to Spark DataFrame using `toPandas()` or pandas udfs, timestamp values behave to respect Python system timezone instead of session timezone. For example, let's say we use `"America/Los_Angeles"` as session timezone and have a timestamp value `"1970-01-01 00:00:01"` in the timezone. Btw, I'm in Japan so Python timezone would be `"Asia/Tokyo"`. The timestamp value from current `toPandas()` will be the following: ``` >>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") >>> df = spark.createDataFrame([28801], "long").selectExpr("timestamp(value) as ts") >>> df.show() +---+ | ts| +---+ |1970-01-01 00:00:01| +---+ >>> df.toPandas() ts 0 1970-01-01 17:00:01 ``` As you can see, the value becomes `"1970-01-01 17:00:01"` because it respects Python timezone. As we discussed in #18664, we consider this behavior is a bug and the value should be `"1970-01-01 00:00:01"`. ## How was this patch tested? Added tests and existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-22395 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19607.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19607 commit 4735e5981ecf3a4bce50ce86f706e25830f4a801 Author: Takuya UESHINDate: 2017-10-23T06:27:22Z Add a conf to make Pandas DataFrame respect session local timezone. commit 1f85150dc5b26df21dca6bad2ef4eaec342c4400 Author: Takuya UESHIN Date: 2017-10-23T08:09:16Z Fix toPandas() behavior. commit 5c08ecf247bfe7e14afcdef8eba1c25cb3b68634 Author: Takuya UESHIN Date: 2017-10-23T09:15:47Z Modify pandas UDFs to respect session timezone. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19605: [SPARK-22394] [SQL] Remove redundant synchronizat...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19605#discussion_r147619876 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -89,10 +89,12 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } /** - * Run some code involving `client` in a [[synchronized]] block and wrap certain - * exceptions thrown in the process in [[AnalysisException]]. + * Run some code involving `client` and wrap certain exceptions thrown in the process in + * [[AnalysisException]]. Thread-safety is guaranteed here because methods in the `client` + * ([[org.apache.spark.sql.hive.client.HiveClientImpl]]) are already synchronized through + * `clientLoader` in the `retryLocked` method. */ - private def withClient[T](body: => T): T = synchronized { + private def withClient[T](body: => T): T = { --- End diff -- If you check the callers of `withClient`, you can find many callers conduct multiple client-related operations in the same `body`. Removing this lock might cause some concurrency issues. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18805 **[Test build #83204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83204/testReport)** for PR 18805 at commit [`eba3024`](https://github.com/apache/spark/commit/eba30249108f195a4442fb8cae35d5f02f5f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r147618796 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,33 @@ private final class SnappyOutputStreamWrapper(os: SnappyOutputStream) extends Ou } } } + +/** + * :: DeveloperApi :: + * ZStandard implementation of [[org.apache.spark.io.CompressionCodec]]. For more + * details see - http://facebook.github.io/zstd/ + * + * @note The wire protocol for this codec is not guaranteed to be compatible across versions + * of Spark. This is intended for use as an internal compression utility within a single Spark + * application. + */ +@DeveloperApi +class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { + + override def compressedOutputStream(s: OutputStream): OutputStream = { +// Default compression level for zstd compression to 1 because it is +// fastest of all with reasonably high compression ratio. +val level = conf.getSizeAsBytes("spark.io.compression.zstd.level", "1").toInt --- End diff -- Good eye, fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19605: [SPARK-22394] [SQL] Remove redundant synchronization for...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19605 cc @cloud-fan @rxin @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19604 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83201/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19604 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19604 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19604 **[Test build #83201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83201/testReport)** for PR 19604 at commit [`995e38e`](https://github.com/apache/spark/commit/995e38e118126d95b2fe5ee8416e5f36786a7b5b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19604 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19604 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83200/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19604 **[Test build #83200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83200/testReport)** for PR 19604 at commit [`549cb81`](https://github.com/apache/spark/commit/549cb814e01c2338a67c4a9efa4d880a3fb9cdac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19605: [SPARK-22394] [SQL] Remove redundant synchronization for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83202/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19605: [SPARK-22394] [SQL] Remove redundant synchronization for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19605 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19605: [SPARK-22394] [SQL] Remove redundant synchronization for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19605 **[Test build #83202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83202/testReport)** for PR 19605 at commit [`072b27d`](https://github.com/apache/spark/commit/072b27d083f2c2ed8d8bdd20caa5b0fe0ba267f6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83199/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19599 **[Test build #83199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83199/testReport)** for PR 19599 at commit [`01e7d3d`](https://github.com/apache/spark/commit/01e7d3d5f9b0ae278ebce60635e5c2568d3d0cf3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19606: [SPARK-22333][SQL][Backport-2.2]timeFunctionCall(CURRENT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19606 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19606: [SPARK-22333][SQL][Backport-2.2]timeFunctionCall(...
GitHub user DonnyZone opened a pull request: https://github.com/apache/spark/pull/19606 [SPARK-22333][SQL][Backport-2.2]timeFunctionCall(CURRENT_DATE, CURRENT_TIMESTAMP) has conflicts with columnReference ## What changes were proposed in this pull request? This is a backport pr of https://github.com/apache/spark/pull/19559 for branch-2.2 ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/DonnyZone/spark branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19606.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19606 commit 2bcc2ea6fd0ca9f12959246bb9ee6796cb7a90a0 Author: donnyzoneDate: 2017-10-30T03:08:36Z 2.2-backport --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18833: [SPARK-21625][DOC] Add incompatible Hive UDF describe to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18833 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83203/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18833: [SPARK-21625][DOC] Add incompatible Hive UDF describe to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18833 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18833: [SPARK-21625][DOC] Add incompatible Hive UDF describe to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18833 **[Test build #83203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83203/testReport)** for PR 18833 at commit [`cbbfa5e`](https://github.com/apache/spark/commit/cbbfa5edf8d9edf1d25fb1c456725cac73418602). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Sqrt(child: Expression) extends UnaryMathExpression(math.sqrt, \"SQRT\")` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19595: [SPARK-22379][PYTHON] Reduce duplication setUpClass and ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19595 Thank you @ueshin. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18833: [SPARK-21625][SQL] sqrt(negative number) should be null.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18833 **[Test build #83203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83203/testReport)** for PR 18833 at commit [`cbbfa5e`](https://github.com/apache/spark/commit/cbbfa5edf8d9edf1d25fb1c456725cac73418602). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19595: [SPARK-22379][PYTHON] Reduce duplication setUpCla...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19595 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19595: [SPARK-22379][PYTHON] Reduce duplication setUpClass and ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19595 Thanks! merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19595: [SPARK-22379][PYTHON] Reduce duplication setUpClass and ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19595 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL][FOLLOWUP] Conversion error when trans...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19604 Ah, let's remove `[FOLLOWUP]` or replace it to something like `[BRANCH-2.2]` in the PR title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/19559 Sure, I will submit it later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19596 Thanks, @viirya. I wasn't sure if I should add it to the list. My intention was .. this one is like `DataFrameReader` and `DataFrameWriter` (supposed to be used via `spark.read`) and I wanted to.. like hide the package path `pyspark.sql.Catalog` in the doc and, I just decided the smallest change I could think for this issue. I am fine with adding it too. It's easy to add it if anyone feels strongly about this. Please let me know. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL][FOLLOWUP] Conversion error when trans...
Github user jmchung commented on the issue: https://github.com/apache/spark/pull/19604 I found in this branch the Docker-based integration will fail due to can not pull the image `wnameless/oracle-xe-11g:14.04.4`, should we move on to `wnameless/oracle-xe-11g`? ``` Error response from daemon: manifest for wnameless/oracle-xe-11g:14.04.4 not found ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19589: [SPARKR][SPARK-22344] Set java.io.tmpdir for Spar...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19589 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19589: [SPARKR][SPARK-22344] Set java.io.tmpdir for SparkR test...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/19589 Merging to master, branch-2.2 and branch-2.1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19605: [SPARK-22394] [SQL] Remove redundant synchronization for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19605 **[Test build #83202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83202/testReport)** for PR 19605 at commit [`072b27d`](https://github.com/apache/spark/commit/072b27d083f2c2ed8d8bdd20caa5b0fe0ba267f6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19605: [SPARK-22394] [SQL] Remove redundant synchronizat...
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/19605 [SPARK-22394] [SQL] Remove redundant synchronization for metastore access ## What changes were proposed in this pull request? Before Spark 2.x, synchronization for metastore access was protected at [line229 in ClientWrapper](https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala#L229) (now it's at [line203 in HiveClientWrapper ](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L203)). After Spark 2.x, `HiveExternalCatalog` was introduced by [SPARK-13080](https://github.com/apache/spark/pull/11293), where an extra level of synchronization was added at [line95](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L95). That is, now we have two levels of synchronization: one is `HiveExternalCatalog` and the other is `IsolatedClientLoader` in `HiveClientImpl`. But since both `HiveExternalCatalog` and `IsolatedClientLoader` are shared among all spark sessions, the extra level of synchronization in `Hiv eExternalCatalog` is redundant, thus can be removed. ## How was this patch tested? Manual test and existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark redundant_sync Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19605.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19605 commit 072b27d083f2c2ed8d8bdd20caa5b0fe0ba267f6 Author: Zhenhua WangDate: 2017-10-30T01:47:12Z remove redundant sync --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL][FOLLOWUP] Conversion error when trans...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19604 **[Test build #83201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83201/testReport)** for PR 19604 at commit [`995e38e`](https://github.com/apache/spark/commit/995e38e118126d95b2fe5ee8416e5f36786a7b5b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19604: [SPARK-22291][SQL][FOLLOWUP] Conversion error whe...
Github user jmchung commented on a diff in the pull request: https://github.com/apache/spark/pull/19604#discussion_r147605119 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -440,8 +440,9 @@ object JdbcUtils extends Logging { case StringType => (array: Object) => -array.asInstanceOf[Array[java.lang.String]] - .map(UTF8String.fromString) +// some underling types are not String such as uuid, inet, cidr, etc. --- End diff -- Oops, a typo occurred, thanks @viirya !! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19604: [SPARK-22291][SQL][FOLLOWUP] Conversion error whe...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19604#discussion_r147604947 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -440,8 +440,9 @@ object JdbcUtils extends Logging { case StringType => (array: Object) => -array.asInstanceOf[Array[java.lang.String]] - .map(UTF8String.fromString) +// some underling types are not String such as uuid, inet, cidr, etc. --- End diff -- underlying? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19520: [SPARK-22298][WEB-UI] url encode APP id before generatin...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/19520 I would like to ask, under what circumstances the application id will contain a forward slash? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL][FOLLOWUP] Conversion error when trans...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19604 **[Test build #83200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83200/testReport)** for PR 19604 at commit [`549cb81`](https://github.com/apache/spark/commit/549cb814e01c2338a67c4a9efa4d880a3fb9cdac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19604: [SPARK-22291][SQL][FOLLOWUP] Conversion error when trans...
Github user jmchung commented on the issue: https://github.com/apache/spark/pull/19604 cc @cloud-fan, the follow-up PR for 2.2, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19604: [SPARK-22291][SQL][FOLLOWUP] Conversion error whe...
GitHub user jmchung opened a pull request: https://github.com/apache/spark/pull/19604 [SPARK-22291][SQL][FOLLOWUP] Conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQL ⦠types of uuid, inet and cidr to StingType in PostgreSQL ## What changes were proposed in this pull request? This is a followup of #19567 , to fix the conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQL for Spark 2.2. ## How was this patch tested? Added test in `PostgresIntegrationSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jmchung/spark SPARK-22291-FOLLOWUP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19604.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19604 commit 549cb814e01c2338a67c4a9efa4d880a3fb9cdac Author: Jen-Ming ChungDate: 2017-10-30T01:25:28Z [SPARK-22291][SQL][FOLLOWUP] Conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQL --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19596 I've generated the Python docs. Looks good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19507: [WEB-UI] Add count in fair scheduler pool page
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/19507 @srowen Help review the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19532: [CORE]Modify the duration real-time calculation and upda...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/19532 @jiangxb1987 @srowen Help review the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19596 Don't we like to add it to the list of `Important classes of Spark SQL and DataFrames`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19595: [SPARK-22379][PYTHON] Reduce duplication setUpClass and ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19595 cc @ueshin, could you take a look please when you have some time? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19596: [SPARK-22369][PYTHON][DOCS] Exposes catalog API document...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19596 cc @holdenk and @viirya, mind taking a look please? I remember I had few talks about Sphinx and `__all__` and I believe you guys are right reviewers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18113: [SPARK-20890][SQL] Added min and max typed aggregation f...
Github user setjet commented on the issue: https://github.com/apache/spark/pull/18113 Hi, it has been a while but I can pick it back up when I have time next weekend or so if that's OK. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19599 **[Test build #83199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83199/testReport)** for PR 19599 at commit [`01e7d3d`](https://github.com/apache/spark/commit/01e7d3d5f9b0ae278ebce60635e5c2568d3d0cf3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19553 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83198/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19553 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19553 **[Test build #83198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83198/testReport)** for PR 19553 at commit [`235f6d6`](https://github.com/apache/spark/commit/235f6d67cf25f4016c8e8ffb77103770e855ec62). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19553: [SPARK-22330][CORE] Linear containsKey operation ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19553#discussion_r147596592 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala --- @@ -43,10 +43,17 @@ private[spark] object JavaUtils { override def size: Int = underlying.size -override def get(key: AnyRef): B = try { - underlying.getOrElse(key.asInstanceOf[A], null.asInstanceOf[B]) -} catch { - case ex: ClassCastException => null.asInstanceOf[B] +// Delegate to implementation because AbstractMap implementation iterates over whole key set +override def containsKey(key: AnyRef): Boolean = { + underlying.contains(key.asInstanceOf[A]) --- End diff -- I thought it should throw exception, however there is a test showing that it's fine... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19603 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19603 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83197/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19603 **[Test build #83197 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83197/testReport)** for PR 19603 at commit [`d09d9bd`](https://github.com/apache/spark/commit/d09d9bd10331ebd8992e1d7930236162c53ee37e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19553: [SPARK-22330][CORE] Linear containsKey operation ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19553#discussion_r147593655 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala --- @@ -43,10 +43,17 @@ private[spark] object JavaUtils { override def size: Int = underlying.size -override def get(key: AnyRef): B = try { - underlying.getOrElse(key.asInstanceOf[A], null.asInstanceOf[B]) -} catch { - case ex: ClassCastException => null.asInstanceOf[B] +// Delegate to implementation because AbstractMap implementation iterates over whole key set +override def containsKey(key: AnyRef): Boolean = { + underlying.contains(key.asInstanceOf[A]) --- End diff -- Really, this should return `false` if the key isn't an `A`. This will throw an exception now. It should be prefixed with `key.isInstanceOf[A] && ...` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19553: [SPARK-22330][CORE] Linear containsKey operation ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19553#discussion_r147593724 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala --- @@ -43,10 +43,17 @@ private[spark] object JavaUtils { override def size: Int = underlying.size -override def get(key: AnyRef): B = try { - underlying.getOrElse(key.asInstanceOf[A], null.asInstanceOf[B]) -} catch { - case ex: ClassCastException => null.asInstanceOf[B] +// Delegate to implementation because AbstractMap implementation iterates over whole key set +override def containsKey(key: AnyRef): Boolean = { + underlying.contains(key.asInstanceOf[A]) +} + +override def get(key: AnyRef): B = { + val value = underlying.get(key.asInstanceOf[A]) + if (value.isDefined && value.get.isInstanceOf[B]) { --- End diff -- `underlying` values are already known to be `B`, so this isn't necessary. But a condition of the key is. ``` if (key.instanceOf[A]) { underlying.getOrElse(key.asInstanceOf[A], null) } else { null } ``` Might need an extra cast in there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19603 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83196/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19603 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19603 **[Test build #83196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83196/testReport)** for PR 19603 at commit [`83607a3`](https://github.com/apache/spark/commit/83607a3d727b9a600271ba61e0c2976fc3c125c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...
Github user Whoosh commented on the issue: https://github.com/apache/spark/pull/19553 @cloud-fan I've checked all core tests, it was fine, should I do smt in addition to? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19553 **[Test build #83198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83198/testReport)** for PR 19553 at commit [`235f6d6`](https://github.com/apache/spark/commit/235f6d67cf25f4016c8e8ffb77103770e855ec62). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19553 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19553 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19553: [SPARK-22330][CORE] Linear containsKey operation ...
Github user Whoosh commented on a diff in the pull request: https://github.com/apache/spark/pull/19553#discussion_r147591080 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala --- @@ -43,10 +43,15 @@ private[spark] object JavaUtils { override def size: Int = underlying.size -override def get(key: AnyRef): B = try { - underlying.getOrElse(key.asInstanceOf[A], null.asInstanceOf[B]) -} catch { - case ex: ClassCastException => null.asInstanceOf[B] +// Delegate to implementation because AbstractMap implementation iterates over whole key set +override def containsKey(key: AnyRef): Boolean = key match { + case key: A => underlying.contains(key) --- End diff -- @srowen It can't be so. Will cause "abstract type A is unchecked since it is eliminated by erasure" compile-time error. As I guess, there is no need any type checking before a get because it'll have cast to Object anyway and get(key) it's only compiling issues, please correct me if I'm wrong, I've added a simple test for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19571: [SPARK-15474][SQL] Write and read back non-emtpy schema ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19571 yes please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19571: [SPARK-15474][SQL] Write and read back non-emtpy schema ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19571 I see. Then, can we continue on #17980 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19603 **[Test build #83197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83197/testReport)** for PR 19603 at commit [`d09d9bd`](https://github.com/apache/spark/commit/d09d9bd10331ebd8992e1d7930236162c53ee37e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19593: [WIP][SPARK-22374][SQL][2.2] closeAllForUGI is re...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/19593 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19593: [WIP][SPARK-22374][SQL][2.2] closeAllForUGI is required ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19593 Thank you for review, @vanzin . Sorry, I'll close this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19601 After I think about the choice for a while, I conclude that it is better to add the new `WritableColumnVector` (i.e. `UnsafeColumnVector`) and to keep the current `ColumnVector.Array`. I think that to add a new class will give us some flexibility and good abstraction between public class `ColumnVector` and other internal classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19603#discussion_r147589361 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -591,18 +591,40 @@ case class MapObjects private( case _ => inputData.dataType } -val (getLength, getLoopVar) = inputDataType match { +val (getLength, prepareLoop, getLoopVar) = inputDataType match { case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) => -s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)" +val it = ctx.freshName("it") +( + s"${genInputData.value}.size()", --- End diff -- I see. got it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19603#discussion_r147589355 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -591,6 +591,9 @@ case class MapObjects private( case _ => inputData.dataType } +// `MapObjects` generates a while loop to traverse the elements of the input collection. We +// need to take care of Seq and List because they may have O(n) complexity for indexed accessing +// like `list.get(1)`. Here we use Iterator to travers Seq and List. --- End diff -- nit: travers -> traverse --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19529: [SPARK-22308] Support alternative unit testing styles in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19529 I just reverted this PR. @nkronenfeld Could you submit another PR and update the title to `[SPARK-22308][test-maven] Support alternative unit testing styles in external applications`? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19603 **[Test build #83196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83196/testReport)** for PR 19603 at commit [`83607a3`](https://github.com/apache/spark/commit/83607a3d727b9a600271ba61e0c2976fc3c125c1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19603#discussion_r147587911 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -591,18 +591,40 @@ case class MapObjects private( case _ => inputData.dataType } -val (getLength, getLoopVar) = inputDataType match { +val (getLength, prepareLoop, getLoopVar) = inputDataType match { case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) => -s"${genInputData.value}.size()" -> s"${genInputData.value}.apply($loopIndex)" +val it = ctx.freshName("it") +( + s"${genInputData.value}.size()", --- End diff -- otherwise we need a re-sizable array to keep result, which is a lot of change and doesn't have a clear win(re-sizing is expensive). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19567 @jmchung can you send a new PR for 2.2? thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19567: [SPARK-22291][SQL] Conversion error when transfor...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19567 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19567 the last commit just change the test name and shouldn't break pyspark tests, I'm merging to master, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147587396 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; --- End diff -- thanks, fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r147587392 --- Diff: common/unsafe/src/main/java/org/apache/spark/sql/catalyst/expressions/HiveHasher.java --- @@ -38,6 +39,10 @@ public static int hashLong(long input) { return (int) ((input >>> 32) ^ input); } + public static int hashUnsafeBytesBlock(MemoryBlock base, long offset, int lengthInBytes) { --- End diff -- This is based on [this discussion](https://github.com/apache/spark/pull/19222#discussion_r138744794). Currently, when I can see large performance improvement, I do not call non-MemoyBlock version. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19567 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83195/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19567 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19567 **[Test build #83195 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83195/testReport)** for PR 19567 at commit [`fae5c45`](https://github.com/apache/spark/commit/fae5c455b4a754128bc9112bbead4aef3cc322a2). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83192/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #83192 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83192/testReport)** for PR 19222 at commit [`62faf43`](https://github.com/apache/spark/commit/62faf43167f58f102b1d7d7a49cd0f39802898a4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11403: [SPARK-13523] [SQL] Reuse exchanges in a query
Github user gczsjdy commented on the issue: https://github.com/apache/spark/pull/11403 @davies Hi, what do you mean by "Since all the planner only work with tree, so this rule should be the last one for the entire planning."? Thanks if you have time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17899 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83194/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17899 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17899 **[Test build #83194 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83194/testReport)** for PR 17899 at commit [`e9f6928`](https://github.com/apache/spark/commit/e9f6928bb60e7c4de25324e5572f105a30d16cd5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19600: Added more information to Imputer
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/19600 I will follow the guideline strictly next time. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19567 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83190/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19567 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19567 **[Test build #83190 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83190/testReport)** for PR 19567 at commit [`588902d`](https://github.com/apache/spark/commit/588902d21fb12bf80169edc74097d7bda950668c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19602 @gatorsmile Thanks again for review this pr. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org