[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20195 **[Test build #85842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85842/testReport)** for PR 20195 at commit [`f55ace6`](https://github.com/apache/spark/commit/f55ace645b46a429a512eb8e922a7074c4cd8cc0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20013: [SPARK-20657][core] Speed up rendering of the stages pag...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/20013 The major concern is that with these code changes, the memory usage will be much larger with `InMemoryStore`. Also building so many new indexes just for getting `computedQuantiles`, seems overkilling. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18853 **[Test build #85840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85840/testReport)** for PR 18853 at commit [`97a071d`](https://github.com/apache/spark/commit/97a071d91ec25159bba655b2bd9f6e2134d87088). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20096 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85832/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20176 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18853 **[Test build #85841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85841/testReport)** for PR 18853 at commit [`408e889`](https://github.com/apache/spark/commit/408e889caa8d61b7267f0f391be4af5fde82a0c9). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #85839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85839/testReport)** for PR 13599 at commit [`9896de6`](https://github.com/apache/spark/commit/9896de66a6a2eb376aed75be6189c3852cd83f92). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: Boolean)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20176 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85838/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20096 **[Test build #85835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85835/testReport)** for PR 20096 at commit [`341fb20`](https://github.com/apache/spark/commit/341fb20aa4d18f6964d27c87b48822588dfb1833). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) extends OffsetV2 ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20176 **[Test build #85838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85838/testReport)** for PR 20176 at commit [`6f5b080`](https://github.com/apache/spark/commit/6f5b0803fb65b1cc88b0dc2e09d2e9efd76a1368). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20096 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19943 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85837/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20096 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85835/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18853 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85841/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18853 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18853 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85840/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20096 **[Test build #85832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85832/testReport)** for PR 20096 at commit [`2261566`](https://github.com/apache/spark/commit/22615669cc20cda77819786df4ff34aab925a958). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class KafkaContinuousSourceTopicDeletionSuite extends KafkaContinuousTest ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #85830 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85830/testReport)** for PR 13599 at commit [`e231516`](https://github.com/apache/spark/commit/e231516ab7a9c1d380005f568f2a8decb2987186). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: Boolean)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20096 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18853 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19943 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19943 **[Test build #85837 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85837/testReport)** for PR 19943 at commit [`2cf98b6`](https://github.com/apache/spark/commit/2cf98b6734c806f66e21df50520a465b03d9f060). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85839/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85830/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19290: [SPARK-22063][R] Fixes lint check failures in R by lates...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19290 BTW, I believe we are testing it with R 3.4.1 via AppVeyor too. I have been thinking it's good to test both old and new versions ... I think we have a weak promise for `R 3.1+` - http://spark.apache.org/docs/latest/index.html#downloading --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13599 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests ...
GitHub user Ngone51 opened a pull request: https://github.com/apache/spark/pull/20199 [Spark-22967][Hive]Fix VersionSuite's unit tests by change Windows path into URI path ## What changes were proposed in this pull request? Two unit test will fail due to Windows format path: 1.test(s"$version: read avro file containing decimal") ``` org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string); ``` 2.test(s"$version: SPARK-17920: Insert into/overwrite avro table") ``` Unable to infer the schema. The schema specification is required to create the table `default`.`tab2`.; org.apache.spark.sql.AnalysisException: Unable to infer the schema. The schema specification is required to create the table `default`.`tab2`.; ``` This pr fix these two unit test by change Windows path into URI path. ## How was this patch tested? Existed. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Ngone51/spark SPARK-22967 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20199.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20199 commit 3d1cafa1c9387b017a98f2983a4e98842a4a5921 Author: wuyi5 Date: 2018-01-09T08:01:03Z change Windows path into URI format path commit 22669d1ff0cb00261fa146d276af237c115a0488 Author: wuyi5 Date: 2018-01-09T08:08:30Z leave deletion work to ShutdownHookManager --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #85843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85843/testReport)** for PR 13599 at commit [`9896de6`](https://github.com/apache/spark/commit/9896de66a6a2eb376aed75be6189c3852cd83f92). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20199 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19943 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20176 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20176 **[Test build #85844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85844/testReport)** for PR 20176 at commit [`6f5b080`](https://github.com/apache/spark/commit/6f5b0803fb65b1cc88b0dc2e09d2e9efd76a1368). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19943 **[Test build #85845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85845/testReport)** for PR 19943 at commit [`2cf98b6`](https://github.com/apache/spark/commit/2cf98b6734c806f66e21df50520a465b03d9f060). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18853 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20196: [SPARK-23000] Fix Flaky test suite DataSourceWithHiveMet...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20196 LGTM, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20196: [SPARK-23000] Fix Flaky test suite DataSourceWith...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20196 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests ...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/20199#discussion_r160344054 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -58,7 +58,7 @@ class VersionsSuite extends SparkFunSuite with Logging { */ protected def withTempDir(f: File => Unit): Unit = { val dir = Utils.createTempDir().getCanonicalFile -try f(dir) finally Utils.deleteRecursively(dir) +f(dir) --- End diff -- Leave deletion work to ShutdownHookManager to avoid delete IOException caused by 'file occupation in other program' error on Windows. (SEE SPARK-22967) And temp dirs will be cleaned up after unit test completed, but this is only guaranteed for test(s"$version: SPARK-17920: Insert into/overwrite avro table"). And a lot of temp dirs produced by some other unit tests will still remains on Windows for unclear reason, maybe 'file occupation in other program ' too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18853 **[Test build #85846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85846/testReport)** for PR 18853 at commit [`408e889`](https://github.com/apache/spark/commit/408e889caa8d61b7267f0f391be4af5fde82a0c9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/20199 cc @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PySpark] Spark SQL should handle Py...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r160345387 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -120,10 +121,18 @@ object EvaluatePython { case (c: java.math.BigDecimal, dt: DecimalType) => Decimal(c, dt.precision, dt.scale) case (c: Int, DateType) => c +// Pyrolite will unpickle a Python datetime.date to a java.util.Calendar +case (c: Calendar, DateType) => DateTimeUtils.fromJavaCalendarForDate(c) --- End diff -- How about we return `null` in this case? Other cases seems also returning `null` if it fails to be converted: ``` >>> from pyspark.sql.functions import udf >>> f = udf(lambda x: x, "double") >>> spark.range(1).select(f("id")).show() ++ |(id)| ++ |null| ++ ``` Seems we can do it like: ```scala case StringType => (obj: Any) => nullSafeConvert(obj) { case c: Calendar => null case _ => UTF8String.fromString(obj.toString) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20199 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20199 Will take a look soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20199 **[Test build #85847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85847/testReport)** for PR 20199 at commit [`22669d1`](https://github.com/apache/spark/commit/22669d1ff0cb00261fa146d276af237c115a0488). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160347387 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -196,11 +196,24 @@ private[spark] class IndexShuffleBlockResolver( // find out the consolidated file, then the offset within that from our index val indexFile = getIndexFile(blockId.shuffleId, blockId.mapId) -val in = new DataInputStream(new FileInputStream(indexFile)) +// SPARK-22982: if this FileInputStream's position is seeked forward by another piece of code +// which is incorrectly using our file descriptor then this code will fetch the wrong offsets +// (which may cause a reducer to be sent a different reducer's data). The explicit position +// checks added here were a useful debugging aid during SPARK-22982 and may help prevent this +// class of issue from re-occurring in the future which is why they are left here even though +// SPARK-22982 is fixed. +val channel = Files.newByteChannel(indexFile.toPath) +channel.position(blockId.reduceId * 8) +val in = new DataInputStream(Channels.newInputStream(channel)) try { - ByteStreams.skipFully(in, blockId.reduceId * 8) val offset = in.readLong() val nextOffset = in.readLong() + val actualPosition = channel.position() + val expectedPosition = blockId.reduceId * 8 + 16 + if (actualPosition != expectedPosition) { +throw new Exception(s"SPARK-22982: Incorrect channel position after index file reads: " + --- End diff -- Maybe we'd better change to some specific `Exception` type here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20085#discussion_r160347559 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -182,6 +182,111 @@ case class StaticInvoke( } } +/** + * Invokes a call to reference to a static field. + * + * @param staticObject The target of the static call. This can either be the object itself + * (methods defined on scala objects), or the class object + * (static methods defined in java). + * @param dataType The expected return type of the function call. + * @param fieldName The field to reference. + */ +case class StaticField( + staticObject: Class[_], + dataType: DataType, + fieldName: String) extends Expression with NonSQLExpression { + + val objectName = staticObject.getName.stripSuffix("$") + + override def nullable: Boolean = false + override def children: Seq[Expression] = Nil + + override def eval(input: InternalRow): Any = +throw new UnsupportedOperationException("Only code-generated evaluation is supported.") + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val javaType = ctx.javaType(dataType) + +val code = s""" + final $javaType ${ev.value} = $objectName.$fieldName; --- End diff -- do we need this expression for such a simple function? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160347716 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -196,11 +196,24 @@ private[spark] class IndexShuffleBlockResolver( // find out the consolidated file, then the offset within that from our index val indexFile = getIndexFile(blockId.shuffleId, blockId.mapId) -val in = new DataInputStream(new FileInputStream(indexFile)) +// SPARK-22982: if this FileInputStream's position is seeked forward by another piece of code +// which is incorrectly using our file descriptor then this code will fetch the wrong offsets +// (which may cause a reducer to be sent a different reducer's data). The explicit position +// checks added here were a useful debugging aid during SPARK-22982 and may help prevent this +// class of issue from re-occurring in the future which is why they are left here even though +// SPARK-22982 is fixed. +val channel = Files.newByteChannel(indexFile.toPath) +channel.position(blockId.reduceId * 8) --- End diff -- Sorry I'm not clear whether the change here is related to "asynchronous close()" issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160347954 --- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala --- @@ -376,18 +374,13 @@ private[netty] class NettyRpcEnv( def setError(e: Throwable): Unit = { error = e - source.close() } override def read(dst: ByteBuffer): Int = { Try(source.read(dst)) match { +case _ if error != null => throw error --- End diff -- I think it is better to also add a short comment here. This bug is subtle and no test against it now. Just from this code, it is hard to know why we check error even success. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20179: [SPARK-22982] Remove unsafe asynchronous close() call fr...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20179 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20085: [SPARK-22739][Catalyst][WIP] Additional Expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20085#discussion_r160349007 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child: Expression, tag: ClassTag[T], kryo: B } /** - * Initialize a Java Bean instance by setting its field values via setters. + * Initialize an object by invoking the given sequence of method names and method arguments. + * + * @param objectInstance An expression evaluating to a new instance of the object to initialize + * @param setters A sequence of method names and their sequence of argument expressions to apply in + *series to the object instance */ -case class InitializeJavaBean(beanInstance: Expression, setters: Map[String, Expression]) +case class InitializeObject( + objectInstance: Expression, + setters: Seq[(String, Seq[Expression])]) --- End diff -- To generalize, I think we can just have a `NewObject` expression, which just do `new SomeClass`, the setters are just a bunch of `Invoke`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160349274 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -196,11 +196,24 @@ private[spark] class IndexShuffleBlockResolver( // find out the consolidated file, then the offset within that from our index val indexFile = getIndexFile(blockId.shuffleId, blockId.mapId) -val in = new DataInputStream(new FileInputStream(indexFile)) +// SPARK-22982: if this FileInputStream's position is seeked forward by another piece of code +// which is incorrectly using our file descriptor then this code will fetch the wrong offsets +// (which may cause a reducer to be sent a different reducer's data). The explicit position +// checks added here were a useful debugging aid during SPARK-22982 and may help prevent this +// class of issue from re-occurring in the future which is why they are left here even though +// SPARK-22982 is fixed. +val channel = Files.newByteChannel(indexFile.toPath) +channel.position(blockId.reduceId * 8) --- End diff -- It's used to detect bugs like "asynchronous close()" earlier in the future. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PySpark] Spark SQL should handle Py...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r160349750 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -120,10 +121,18 @@ object EvaluatePython { case (c: java.math.BigDecimal, dt: DecimalType) => Decimal(c, dt.precision, dt.scale) case (c: Int, DateType) => c +// Pyrolite will unpickle a Python datetime.date to a java.util.Calendar +case (c: Calendar, DateType) => DateTimeUtils.fromJavaCalendarForDate(c) --- End diff -- Yea it's consistent with other un-convertible cases, but `StringType` is the default return type, I'm afraid many users many hit this and get confused. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20179: [SPARK-22982] Remove unsafe asynchronous close() ...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20179#discussion_r160351383 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala --- @@ -196,11 +196,24 @@ private[spark] class IndexShuffleBlockResolver( // find out the consolidated file, then the offset within that from our index val indexFile = getIndexFile(blockId.shuffleId, blockId.mapId) -val in = new DataInputStream(new FileInputStream(indexFile)) +// SPARK-22982: if this FileInputStream's position is seeked forward by another piece of code +// which is incorrectly using our file descriptor then this code will fetch the wrong offsets +// (which may cause a reducer to be sent a different reducer's data). The explicit position +// checks added here were a useful debugging aid during SPARK-22982 and may help prevent this +// class of issue from re-occurring in the future which is why they are left here even though +// SPARK-22982 is fixed. +val channel = Files.newByteChannel(indexFile.toPath) +channel.position(blockId.reduceId * 8) --- End diff -- I see. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PySpark] Spark SQL should handle Py...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r160355531 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -120,10 +121,18 @@ object EvaluatePython { case (c: java.math.BigDecimal, dt: DecimalType) => Decimal(c, dt.precision, dt.scale) case (c: Int, DateType) => c +// Pyrolite will unpickle a Python datetime.date to a java.util.Calendar +case (c: Calendar, DateType) => DateTimeUtils.fromJavaCalendarForDate(c) --- End diff -- Right. Let's go ahead for 2. then. I am fine if it's done as an exception for practical purpose. Maybe we could add an if `isinstance(.., basestring)` and return directly as a shortcut. I haven't checked the perf diff but I think we can do it easily via profile as I mentioned above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20193: [SPARK-22998][K8S] Set missing value for SPARK_MO...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20193 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20193: [SPARK-22998][K8S] Set missing value for SPARK_MOUNTED_C...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20193 merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20163 I investigated the behavior differences between `udf` and `pandas_udf` for the wrong return types and found there are many differences actually. Basically `udf`s return `null` as @HyukjinKwon mentioned, whereas `pandas_udf`s throw some `ArrowException`. There seem some exceptions, though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PySpark] Spark SQL should handle Py...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r160358011 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -120,10 +121,18 @@ object EvaluatePython { case (c: java.math.BigDecimal, dt: DecimalType) => Decimal(c, dt.precision, dt.scale) case (c: Int, DateType) => c +// Pyrolite will unpickle a Python datetime.date to a java.util.Calendar +case (c: Calendar, DateType) => DateTimeUtils.fromJavaCalendarForDate(c) --- End diff -- WDYT about ^ @ueshin? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160358917 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java --- @@ -0,0 +1,523 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.orc; + +import java.io.IOException; +import java.util.stream.IntStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; +import org.apache.orc.OrcConf; +import org.apache.orc.OrcFile; +import org.apache.orc.Reader; +import org.apache.orc.TypeDescription; +import org.apache.orc.mapred.OrcInputFormat; +import org.apache.orc.storage.common.type.HiveDecimal; +import org.apache.orc.storage.ql.exec.vector.*; +import org.apache.orc.storage.serde2.io.HiveDecimalWritable; + +import org.apache.spark.memory.MemoryMode; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.execution.vectorized.ColumnVectorUtils; +import org.apache.spark.sql.execution.vectorized.OffHeapColumnVector; +import org.apache.spark.sql.execution.vectorized.OnHeapColumnVector; +import org.apache.spark.sql.execution.vectorized.WritableColumnVector; +import org.apache.spark.sql.types.*; +import org.apache.spark.sql.vectorized.ColumnarBatch; + + +/** + * To support vectorization in WholeStageCodeGen, this reader returns ColumnarBatch. + * After creating, `initialize` and `initBatch` should be called sequentially. + */ +public class OrcColumnarBatchReader extends RecordReader { + + /** + * The default size of batch. We use this value for both ORC and Spark consistently --- End diff -- nit: We use this value for ORC reader to make it consistent with Spark's columnar batch, because their default batch sizes are different like the following. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20163 Probably we consider to catch and set nulls in pandas_udf if possible to match the behaviour with udf ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19943#discussion_r160360721 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala --- @@ -118,6 +118,13 @@ class OrcFileFormat } } + override def supportBatch(sparkSession: SparkSession, schema: StructType): Boolean = { +val conf = sparkSession.sessionState.conf +conf.orcVectorizedReaderEnabled && conf.wholeStageEnabled && + schema.length <= conf.wholeStageMaxNumFields && + schema.forall(_.dataType.isInstanceOf[AtomicType]) + } + --- End diff -- Do we need to implement `vectorTypes` as `ParquetFileFormat`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20096 **[Test build #85848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85848/testReport)** for PR 20096 at commit [`2628bd4`](https://github.com/apache/spark/commit/2628bd4fd170b2d11dd77947312a57361b186bf7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r160361785 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -117,6 +117,7 @@ object DecimalType extends AbstractDataType { val MAX_SCALE = 38 val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18) val USER_DEFAULT: DecimalType = DecimalType(10, 0) + val MINIMUM_ADJUSTED_SCALE = 6 --- End diff -- @gatorsmile what about `spark.sql.decimalOperations.mode` which defaults to `native` and accepts also `hive` (and in future also `sql2011` for throwing exception instead of returning NULL)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20163: [SPARK-22966][PySpark] Spark SQL should handle Py...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r160364055 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -120,10 +121,18 @@ object EvaluatePython { case (c: java.math.BigDecimal, dt: DecimalType) => Decimal(c, dt.precision, dt.scale) case (c: Int, DateType) => c +// Pyrolite will unpickle a Python datetime.date to a java.util.Calendar +case (c: Calendar, DateType) => DateTimeUtils.fromJavaCalendarForDate(c) --- End diff -- Yeah, 2. should work for `StringType`. I'd also like to add some documents like 1. for users to be careful about the return type. I've found that `udf`s return `null` and `pandas_udf`s throw some exception in most case when the return type is mismatching. Of course we can try to make the behavior differences between `udf` and `pandas_udf` closer as possible in the future, but I think it is the best effort basis for the mismatching return type. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18853 **[Test build #85846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85846/testReport)** for PR 18853 at commit [`408e889`](https://github.com/apache/spark/commit/408e889caa8d61b7267f0f391be4af5fde82a0c9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18853 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18853 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85846/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18853 **[Test build #85849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85849/testReport)** for PR 18853 at commit [`e763330`](https://github.com/apache/spark/commit/e763330edae88d4dad410214608fb5448d90a989). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20195 **[Test build #85842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85842/testReport)** for PR 20195 at commit [`f55ace6`](https://github.com/apache/spark/commit/f55ace645b46a429a512eb8e922a7074c4cd8cc0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20195 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20195 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85842/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r160376096 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -117,6 +117,7 @@ object DecimalType extends AbstractDataType { val MAX_SCALE = 38 val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18) val USER_DEFAULT: DecimalType = DecimalType(10, 0) + val MINIMUM_ADJUSTED_SCALE = 6 --- End diff -- how about `spark.sql.decimalOperations.allowTruncat`? Let's leave the mode stuff to the type coercion mode. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r160376186 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -117,6 +117,7 @@ object DecimalType extends AbstractDataType { val MAX_SCALE = 38 val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18) val USER_DEFAULT: DecimalType = DecimalType(10, 0) + val MINIMUM_ADJUSTED_SCALE = 6 --- End diff -- We should make it an internal conf and remove it after some releases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20199 **[Test build #85847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85847/testReport)** for PR 20199 at commit [`22669d1`](https://github.com/apache/spark/commit/22669d1ff0cb00261fa146d276af237c115a0488). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20199 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20199: [Spark-22967][Hive]Fix VersionSuite's unit tests by chan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85847/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20189: [SPARK-22975] MetricsReporter should not throw exception...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20189 **[Test build #85850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85850/testReport)** for PR 20189 at commit [`7242eab`](https://github.com/apache/spark/commit/7242eabe00ce84cb132a4a4f16cb53bed1e6afa7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20167: Allow providing Mesos principal & secret via files (SPAR...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/20167 CC @ArtRand @vanzin I would appreciate your reviews as and when you have time --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20176 **[Test build #85844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85844/testReport)** for PR 20176 at commit [`6f5b080`](https://github.com/apache/spark/commit/6f5b0803fb65b1cc88b0dc2e09d2e9efd76a1368). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20176 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Stru...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20176 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85844/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19943 **[Test build #85845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85845/testReport)** for PR 19943 at commit [`2cf98b6`](https://github.com/apache/spark/commit/2cf98b6734c806f66e21df50520a465b03d9f060). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19943 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19943 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85845/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #85843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85843/testReport)** for PR 13599 at commit [`9896de6`](https://github.com/apache/spark/commit/9896de66a6a2eb376aed75be6189c3852cd83f92). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: Boolean)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85843/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20200: [SPARK-23005][Core] Improve RDD.take on small num...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/20200 [SPARK-23005][Core] Improve RDD.take on small number of partitions ## What changes were proposed in this pull request? In current implementation of RDD.take, we overestimate the number of partitions we need to try by 50%: `(1.5 * num * partsScanned / buf.size).toInt` However, when the number is small, the result of `.toInt` is not what we want. E.g, 2.9 will become 2, which should be 3. Use math.Ceil fix the problem. Also clean up the code in RDD.scala. ## How was this patch tested? Unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark Take Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20200.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20200 commit 93a3d8447f5d0d3c576a312084144f16c787cf16 Author: Wang Gengliang Date: 2018-01-09T11:46:36Z Improve take and clean up RDD.scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20200: [SPARK-23005][Core] Improve RDD.take on small number of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20200 **[Test build #85851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85851/testReport)** for PR 20200 at commit [`93a3d84`](https://github.com/apache/spark/commit/93a3d8447f5d0d3c576a312084144f16c787cf16). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20200: [SPARK-23005][Core] Improve RDD.take on small num...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20200#discussion_r160390893 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -985,7 +985,7 @@ abstract class RDD[T: ClassTag]( def subtract( other: RDD[T], p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T] = withScope { -if (partitioner == Some(p)) { +if (partitioner.contains(p)) { --- End diff -- Do we still support scala 2.10? If we do, this will fail compilation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20200: [SPARK-23005][Core] Improve RDD.take on small num...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20200#discussion_r160391233 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -985,7 +985,7 @@ abstract class RDD[T: ClassTag]( def subtract( other: RDD[T], p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T] = withScope { -if (partitioner == Some(p)) { +if (partitioner.contains(p)) { --- End diff -- Actually I think the previous code is more readable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20200: [SPARK-23005][Core] Improve RDD.take on small num...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20200#discussion_r160391487 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1345,13 +1346,12 @@ abstract class RDD[T: ClassTag]( if (buf.isEmpty) { numPartsToTry = partsScanned * scaleUpFactor } else { -// the left side of max is >=1 whenever partsScanned >= 2 -numPartsToTry = Math.max((1.5 * num * partsScanned / buf.size).toInt - partsScanned, 1) +// As left > 0, numPartsToTry is always >= 1 --- End diff -- This is the same a s SparkPlan. executeTake(). Should we also fix that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20096 **[Test build #85848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85848/testReport)** for PR 20096 at commit [`2628bd4`](https://github.com/apache/spark/commit/2628bd4fd170b2d11dd77947312a57361b186bf7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20096 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85848/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20096: [SPARK-22908] Add kafka source and sink for continuous p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20096 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20023: [SPARK-22036][SQL] Decimal multiplication with hi...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20023#discussion_r160394589 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -117,6 +117,7 @@ object DecimalType extends AbstractDataType { val MAX_SCALE = 38 val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18) val USER_DEFAULT: DecimalType = DecimalType(10, 0) + val MINIMUM_ADJUSTED_SCALE = 6 --- End diff -- ok, I'll go with that, thanks @cloud-fan. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/20201 [SPARK-22389][SQL] data source v2 partitioning reporting interface ## What changes were proposed in this pull request? a new interface which allows data source to report partitioning and avoid shuffle at Spark side. The design is pretty like the internal distribution/partitioing framework. Spark defines a `Distribution` interfaces and several concrete implementations, and ask the data source to report a `Partitioning`, the `Partitioning` should tell Spark if it can satisfy a `Distribution` or not. ## How was this patch tested? new test You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark partition-reporting Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20201.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20201 commit be14e3bd7598eb3ed583e18c1d9927d5c7f563b4 Author: Wenchen Fan Date: 2018-01-09T02:08:53Z data source v2 partitioning reporting interface --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20201: [SPARK-22389][SQL] data source v2 partitioning reporting...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20201 cc @rxin @RussellSpitzer @kiszk @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20201: [SPARK-22389][SQL] data source v2 partitioning reporting...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20201 **[Test build #85852 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85852/testReport)** for PR 20201 at commit [`be14e3b`](https://github.com/apache/spark/commit/be14e3bd7598eb3ed583e18c1d9927d5c7f563b4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org