[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96429/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22494 **[Test build #96429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96429/testReport)** for PR 22494 at commit [`b4fdd13`](https://github.com/apache/spark/commit/b4fdd1307059c7df7c386a96aad6bc17b593d9c5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22295 **[Test build #96451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96451/testReport)** for PR 22295 at commit [`d7be3bf`](https://github.com/apache/spark/commit/d7be3bfbdbbcd2d95885f26bef690b7a949ff5ed). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22295 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3360/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22522: [SPARK-25510][TEST] Create new trait replace Benc...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/22522 [SPARK-25510][TEST] Create new trait replace BenchmarkWithCodegen ## What changes were proposed in this pull request? We need create a new trait to replace `BenchmarkWithCodegen` as `BenchmarkWithCodegen` extends from `SparkFunSuite`. For example. when doing `AggregateBenchmark` refactor. Before this change, it should be: ```scala object AggregateBenchmark extends BenchmarkBase { lazy val sparkSession = SparkSession.builder .master("local[1]") .appName(this.getClass.getSimpleName) .config("spark.sql.shuffle.partitions", 1) .config("spark.sql.autoBroadcastJoinThreshold", 1) .getOrCreate() /** Runs function `f` with whole stage codegen on and off. */ def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = { val benchmark = new Benchmark(name, cardinality, output = output) benchmark.addCase(s"$name wholestage off", numIters = 2) { iter => sparkSession.conf.set("spark.sql.codegen.wholeStage", value = false) f } benchmark.addCase(s"$name wholestage on", numIters = 5) { iter => sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true) f } benchmark.run() } override def benchmark(): Unit = { runBenchmark("aggregate without grouping") { val N = 500L << 22 runBenchmark("agg w/o group", N) { sparkSession.range(N).selectExpr("sum(id)").collect() } } ... ``` After this change: ```scala object AggregateBenchmark extends BenchmarkBase with RunBenchmarkWithCodegen { override def benchmark(): Unit = { runBenchmark("aggregate without grouping") { val N = 500L << 22 runBenchmark("agg w/o group", N) { sparkSession.range(N).selectExpr("sum(id)").collect() } } ... ``` All affect benchmarks: ``` AggregateBenchmark BenchmarkWideTable JoinBenchmark MiscBenchmark ObjectHashAggregateExecBenchmark SortBenchmark UnsafeArrayDataBenchmark ``` ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-25510 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22522.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22522 commit 275cc6c5f8f106eb339c7ed01734e279a223705e Author: Yuming Wang Date: 2018-09-21T17:36:57Z Create new BenchmarkWithCodegen trait doesn't extends SparkFunSuite --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22507: [SPARK-25495][SS]FetchedData.reset should reset all fiel...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22507 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96447/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22507: [SPARK-25495][SS]FetchedData.reset should reset all fiel...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22507 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22507: [SPARK-25495][SS]FetchedData.reset should reset all fiel...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22507 **[Test build #96447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96447/testReport)** for PR 22507 at commit [`6eebe34`](https://github.com/apache/spark/commit/6eebe34b305cb786375518cb875aa89af9d500c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96424/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22455 **[Test build #96450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96450/testReport)** for PR 22455 at commit [`4492b27`](https://github.com/apache/spark/commit/4492b278ba5a4721d6a5dc836436191ad155dfc6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18544 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22455 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3359/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22455 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18544 **[Test build #96424 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96424/testReport)** for PR 18544 at commit [`9f07557`](https://github.com/apache/spark/commit/9f07557a6d5356f056bfc0d5e2e6993f7602b487). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219576011 --- Diff: R/pkg/R/DataFrame.R --- @@ -226,7 +226,8 @@ setMethod("showDF", #' show #' -#' Print class and type information of a Spark object. +#' If eager evaluation is enabled and the Spark object is a SparkDataFrame, return the data of --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user adrian555 commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219575988 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96425/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22494 **[Test build #96425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96425/testReport)** for PR 22494 at commit [`ad79c56`](https://github.com/apache/spark/commit/ad79c56ca038fac6797814410d665110ef43e826). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22521 **[Test build #96449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96449/testReport)** for PR 22521 at commit [`f6f9658`](https://github.com/apache/spark/commit/f6f9658e19ae5e74697ee8846b6ab11ab8eba24c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22521 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3358/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22521 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HI...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22521 [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS only once - WIP ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-24519 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22521.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22521 commit 77442cf7e4b64b745079a1ee62684503c7b8c123 Author: Reynold Xin Date: 2018-09-19T00:58:24Z [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS only once commit f23c2202fbec04983d1181d92f7c124280ebcbe3 Author: Reynold Xin Date: 2018-09-21T16:48:59Z Merge branch 'master' of github.com:apache/spark into SPARK-24519 commit ac3dee3227e4ceee4ec100bbe72988f791ae3c87 Author: Reynold Xin Date: 2018-09-21T16:49:52Z x commit f6f9658e19ae5e74697ee8846b6ab11ab8eba24c Author: Reynold Xin Date: 2018-09-21T17:31:51Z fix conflict --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22444: [SPARK-25409][Core]Speed up Spark History loading via in...
Github user jianjianjiao commented on the issue: https://github.com/apache/spark/pull/22444 @vanzin Really thanks for you suggestions. It becomes much faster loading event logs. from more than 2.5 hours, to 19 minutes, loading 17K event logs, some of them are larger than 10G. 1. To enable SHS V2 to caching things on disk. We are using Windows, there is a small "posix.permissions not supported in windows" issue, I create a new PR here https://github.com/apache/spark/pull/22520 , could you please take a look? This change doesn't speed up loading very much, but it improves other part. 2. Tried 2.4, and also tried applying SPARK-6951 to 2.3. this is the critical part improving the speed. I will close this PR, as it is useless now. Thanks again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22444: [SPARK-25409][Core]Speed up Spark History loading...
Github user jianjianjiao closed the pull request at: https://github.com/apache/spark/pull/22444 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22520: [SPARK-25509][Core]Windows doesn't support POSIX permiss...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22520 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22520: [SPARK-25509][Core]Windows doesn't support POSIX permiss...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22520 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22520: [SPARK-25509][Core]Windows doesn't support POSIX permiss...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22520 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22520: [SPARK-25509][Core]Windows doesn't support POSIX ...
GitHub user jianjianjiao opened a pull request: https://github.com/apache/spark/pull/22520 [SPARK-25509][Core]Windows doesn't support POSIX permissions ## What changes were proposed in this pull request? SHS V2 cannot enabled in Windoes, because windows doesn't support POSIX permission. ## How was this patch tested? test case fails in windows without this fix. org.apache.spark.deploy.history.HistoryServerDiskManagerSuite test("leasing space") You can merge this pull request into a Git repository by running: $ git pull https://github.com/jianjianjiao/spark FixWindowsPermssionsIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22520 commit fe74feeef42fc6fb6fb5f5e869e23b349f3a1697 Author: Rong Tang Date: 2018-09-21T17:07:44Z Windows doesn't support Posix permissions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(number)
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22419 **[Test build #96448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96448/testReport)** for PR 22419 at commit [`479b31f`](https://github.com/apache/spark/commit/479b31fa046e8402f4f93cdbad5fe93ef1ea570f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(number)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22419 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3357/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(number)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22419 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22419: [SPARK-23906][SQL] Add built-in UDF TRUNCATE(number)
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22419 @ueshin Thanks a lot! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22507: [SPARK-25495][SS]FetchedData.reset should reset all fiel...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22507 **[Test build #96447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96447/testReport)** for PR 22507 at commit [`6eebe34`](https://github.com/apache/spark/commit/6eebe34b305cb786375518cb875aa89af9d500c8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22507: [SPARK-25495][SS]FetchedData.reset should reset all fiel...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22507 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22507: [SPARK-25495][SS]FetchedData.reset should reset all fiel...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22507 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3356/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaki...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22492 We can keep it in master if the next release is 3.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96422/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18544 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18544 **[Test build #96422 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96422/testReport)** for PR 18544 at commit [`6f12ad6`](https://github.com/apache/spark/commit/6f12ad68cdb7ab75a25c581286be35e847a2e0bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests fail...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/22480 So my one concern is the comment "I am pretty sure there are some guys already debugging this." - do we actually know who, do we have a place to track this? Do we have a blocker filed to verify this before release or how are we going to ensure it's fixed? I don't have MacOs personally so I just want make sure we don't have this issue fall through the cracks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22275#discussion_r219556033 --- Diff: python/pyspark/serializers.py --- @@ -208,8 +214,26 @@ def load_stream(self, stream): for batch in reader: yield batch +if self.load_batch_order: +num = read_int(stream) +self.batch_order = [] +for i in xrange(num): +index = read_int(stream) +self.batch_order.append(index) + +def get_batch_order_and_reset(self): --- End diff -- Looking at `_load_from_socket` I think I understand why this was done as a separate function here, but what about if the serializer its self returned either a tuple or re-ordered the batches its self? I'm just trying to get a better understanding, not saying those are better designs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22275#discussion_r219558311 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -3279,34 +3280,33 @@ class Dataset[T] private[sql]( val timeZoneId = sparkSession.sessionState.conf.sessionLocalTimeZone withAction("collectAsArrowToPython", queryExecution) { plan => - PythonRDD.serveToStream("serve-Arrow") { out => + PythonRDD.serveToStream("serve-Arrow") { outputStream => +val out = new DataOutputStream(outputStream) val batchWriter = new ArrowBatchStreamWriter(schema, out, timeZoneId) val arrowBatchRdd = toArrowBatchRdd(plan) val numPartitions = arrowBatchRdd.partitions.length -// Store collection results for worst case of 1 to N-1 partitions -val results = new Array[Array[Array[Byte]]](numPartitions - 1) -var lastIndex = -1 // index of last partition written +// Batches ordered by (index of partition, batch # in partition) tuple +val batchOrder = new ArrayBuffer[(Int, Int)]() +var partitionCount = 0 -// Handler to eagerly write partitions to Python in order +// Handler to eagerly write batches to Python out of order def handlePartitionBatches(index: Int, arrowBatches: Array[Array[Byte]]): Unit = { - // If result is from next partition in order - if (index - 1 == lastIndex) { + if (arrowBatches.nonEmpty) { batchWriter.writeBatches(arrowBatches.iterator) -lastIndex += 1 -// Write stored partitions that come next in order -while (lastIndex < results.length && results(lastIndex) != null) { - batchWriter.writeBatches(results(lastIndex).iterator) - results(lastIndex) = null - lastIndex += 1 -} -// After last batch, end the stream -if (lastIndex == results.length) { - batchWriter.end() +arrowBatches.indices.foreach { i => batchOrder.append((index, i)) } + } + partitionCount += 1 + + // After last batch, end the stream and write batch order + if (partitionCount == numPartitions) { +batchWriter.end() +out.writeInt(batchOrder.length) +// Batch order indices are from 0 to N-1 batches, sorted by order they arrived --- End diff -- How about something like `// Sort by the output global batch indexes partition index, partition batch index tuple`? When I was first read this code path I got confused my self so I think we should spend a bit of time on the comment here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22275#discussion_r219556534 --- Diff: python/pyspark/serializers.py --- @@ -208,8 +214,26 @@ def load_stream(self, stream): for batch in reader: yield batch +if self.load_batch_order: +num = read_int(stream) +self.batch_order = [] --- End diff -- If we're going to have get_batch_order_and_reset as a separate function, could we verify batch_order is None before we reset and throw here if it's not? Just thinking of future folks who might have to debug something here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22275#discussion_r219561178 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -3279,34 +3280,33 @@ class Dataset[T] private[sql]( val timeZoneId = sparkSession.sessionState.conf.sessionLocalTimeZone withAction("collectAsArrowToPython", queryExecution) { plan => - PythonRDD.serveToStream("serve-Arrow") { out => + PythonRDD.serveToStream("serve-Arrow") { outputStream => +val out = new DataOutputStream(outputStream) val batchWriter = new ArrowBatchStreamWriter(schema, out, timeZoneId) val arrowBatchRdd = toArrowBatchRdd(plan) val numPartitions = arrowBatchRdd.partitions.length -// Store collection results for worst case of 1 to N-1 partitions -val results = new Array[Array[Array[Byte]]](numPartitions - 1) -var lastIndex = -1 // index of last partition written +// Batches ordered by (index of partition, batch # in partition) tuple +val batchOrder = new ArrayBuffer[(Int, Int)]() +var partitionCount = 0 -// Handler to eagerly write partitions to Python in order +// Handler to eagerly write batches to Python out of order def handlePartitionBatches(index: Int, arrowBatches: Array[Array[Byte]]): Unit = { - // If result is from next partition in order - if (index - 1 == lastIndex) { + if (arrowBatches.nonEmpty) { batchWriter.writeBatches(arrowBatches.iterator) -lastIndex += 1 -// Write stored partitions that come next in order -while (lastIndex < results.length && results(lastIndex) != null) { - batchWriter.writeBatches(results(lastIndex).iterator) - results(lastIndex) = null - lastIndex += 1 -} -// After last batch, end the stream -if (lastIndex == results.length) { - batchWriter.end() +arrowBatches.indices.foreach { i => batchOrder.append((index, i)) } --- End diff -- Could we call `i` something more descriptive like partition_batch_num or similar? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22275#discussion_r219557215 --- Diff: python/pyspark/sql/tests.py --- @@ -4434,6 +4434,12 @@ def test_timestamp_dst(self): self.assertPandasEqual(pdf, df_from_python.toPandas()) self.assertPandasEqual(pdf, df_from_pandas.toPandas()) +def test_toPandas_batch_order(self): +df = self.spark.range(64, numPartitions=8).toDF("a") +with self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 4}): +pdf, pdf_arrow = self._toPandas_arrow_toggle(df) +self.assertPandasEqual(pdf, pdf_arrow) --- End diff -- This looks pretty similar to the kind of test case we could verify with something like hypothesis. Integrating hypothesis is probably too much work, but we could at least explore num partitions space in a loop quickly here. Would that help do you think @felixcheung ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsing...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22515 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22407 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22407 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96421/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22407 **[Test build #96421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96421/testReport)** for PR 22407 at commit [`bb18108`](https://github.com/apache/spark/commit/bb181084b8d0130bf53fcc1417b10d518eae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21527: [SPARK-24519] Make the threshold for highly compr...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21527#discussion_r219559889 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -50,7 +50,9 @@ private[spark] sealed trait MapStatus { private[spark] object MapStatus { def apply(loc: BlockManagerId, uncompressedSizes: Array[Long]): MapStatus = { -if (uncompressedSizes.length > 2000) { +if (uncompressedSizes.length > Option(SparkEnv.get) --- End diff -- the only tricky thing is how to write the test cases for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22494 **[Test build #96446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96446/testReport)** for PR 22494 at commit [`b4fdd13`](https://github.com/apache/spark/commit/b4fdd1307059c7df7c386a96aad6bc17b593d9c5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3355/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22494 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96423/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22494 **[Test build #96423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96423/testReport)** for PR 22494 at commit [`cc149be`](https://github.com/apache/spark/commit/cc149bef814855c27cd38599d56ddd6b99d2a587). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96430/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r219551669 --- Diff: python/pyspark/sql/tests.py --- @@ -3654,6 +3654,107 @@ def test_jvm_default_session_already_set(self): spark.stop() +class SparkSessionTests2(ReusedSQLTestCase): --- End diff -- @HyukjinKwon there's no strong need for it, however it does mean that the first `getOrCreate` will already have a session it can use, but given that we set up and tear down the session this may be less than ideal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r219552522 --- Diff: python/pyspark/sql/tests.py --- @@ -3654,6 +3654,107 @@ def test_jvm_default_session_already_set(self): spark.stop() +class SparkSessionTests2(ReusedSQLTestCase): + +def test_active_session(self): +spark = SparkSession.builder \ +.master("local") \ +.getOrCreate() +try: +activeSession = spark.getActiveSession() +df = activeSession.createDataFrame([(1, 'Alice')], ['age', 'name']) +self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')]) +finally: +spark.stop() + +def test_SparkSession(self): +spark = SparkSession.builder \ +.master("local") \ +.config("some-config", "v2") \ +.getOrCreate() +try: +self.assertEqual(spark.conf.get("some-config"), "v2") +self.assertEqual(spark.sparkContext._conf.get("some-config"), "v2") +self.assertEqual(spark.version, spark.sparkContext.version) +spark.sql("CREATE DATABASE test_db") +spark.catalog.setCurrentDatabase("test_db") +self.assertEqual(spark.catalog.currentDatabase(), "test_db") +spark.sql("CREATE TABLE table1 (name STRING, age INT) USING parquet") +self.assertEqual(spark.table("table1").columns, ['name', 'age']) +self.assertEqual(spark.range(3).count(), 3) +finally: +spark.stop() + +def test_global_default_session(self): +spark = SparkSession.builder \ +.master("local") \ +.getOrCreate() +try: +self.assertEqual(SparkSession.builder.getOrCreate(), spark) +finally: +spark.stop() + +def test_default_and_active_session(self): +spark = SparkSession.builder \ +.master("local") \ +.getOrCreate() +activeSession = spark._jvm.SparkSession.getActiveSession() +defaultSession = spark._jvm.SparkSession.getDefaultSession() +try: +self.assertEqual(activeSession, defaultSession) +finally: +spark.stop() + +def test_config_option_propagated_to_existing_SparkSession(self): +session1 = SparkSession.builder \ +.master("local") \ +.config("spark-config1", "a") \ +.getOrCreate() +self.assertEqual(session1.conf.get("spark-config1"), "a") +session2 = SparkSession.builder \ +.config("spark-config1", "b") \ +.getOrCreate() +try: +self.assertEqual(session1, session2) +self.assertEqual(session1.conf.get("spark-config1"), "b") +finally: +session1.stop() + +def test_newSession(self): +session = SparkSession.builder \ +.master("local") \ +.getOrCreate() +newSession = session.newSession() +try: +self.assertNotEqual(session, newSession) +finally: +session.stop() +newSession.stop() + +def test_create_new_session_if_old_session_stopped(self): +session = SparkSession.builder \ +.master("local") \ +.getOrCreate() +session.stop() +newSession = SparkSession.builder \ +.master("local") \ +.getOrCreate() +try: +self.assertNotEqual(session, newSession) +finally: +newSession.stop() + +def test_create_SparkContext_then_SparkSession(self): --- End diff -- I don't strongly agree here. I think given that the method names are camel case in the `SparkSession` & `SparkContext` in Python this naming is perfectly reasonable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22489 @dongjoon-hyun @gatorsmile Can the fix be included into the upcoming minor release of 2.3? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #96430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96430/testReport)** for PR 22512 at commit [`bff88ee`](https://github.com/apache/spark/commit/bff88ee81f57900cca38df8455c4a2eb78b4b758). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` s\"but $` * `case class Literal(value: Any, dataType: DataType) extends LeafExpression ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r219552270 --- Diff: python/pyspark/sql/session.py --- @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None): or SparkSession._instantiatedSession._sc._jsc is None: SparkSession._instantiatedSession = self self._jvm.SparkSession.setDefaultSession(self._jsparkSession) +self._jvm.SparkSession.setActiveSession(self._jsparkSession) --- End diff -- Yes this seems like the right path forward, thanks for figuring out that was missing as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96426/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22495: [SPARK-25486][TEST] Refactor SortBenchmark to use main m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22495 **[Test build #96444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96444/testReport)** for PR 22495 at commit [`be2d1c0`](https://github.com/apache/spark/commit/be2d1c0e1b224386b2d3a5c43b6f2b1638604607). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22493: [SPARK-25485][TEST] Refactor UnsafeProjectionBenchmark t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22493 **[Test build #96445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96445/testReport)** for PR 22493 at commit [`52d3f73`](https://github.com/apache/spark/commit/52d3f73f5f0f1a76d8d8a20e07543f99a70bb854). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22263 **[Test build #96426 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96426/testReport)** for PR 22263 at commit [`865e7d1`](https://github.com/apache/spark/commit/865e7d10be99c5f0ccfc89b2fc208de91e810ded). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22288 **[Test build #96443 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96443/testReport)** for PR 22288 at commit [`4c88168`](https://github.com/apache/spark/commit/4c881680fdde32244030b54b44125ac217dacb0d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r219551059 --- Diff: python/pyspark/sql/session.py --- @@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None): or SparkSession._instantiatedSession._sc._jsc is None: SparkSession._instantiatedSession = self self._jvm.SparkSession.setDefaultSession(self._jsparkSession) +self._jvm.SparkSession.setActiveSession(self._jsparkSession) --- End diff -- Yes, that sounds like the right approach and I think we need that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22490: [SPARK-25481][TEST] Refactor ColumnarBatchBenchmark to u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22490 **[Test build #96442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96442/testReport)** for PR 22490 at commit [`fb1ab6a`](https://github.com/apache/spark/commit/fb1ab6a35769cfdf743f7c880524b2a102ad2c3c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22288 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22288 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3354/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22493: [SPARK-25485][TEST] Refactor UnsafeProjectionBenchmark t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22493 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96439/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22493: [SPARK-25485][TEST] Refactor UnsafeProjectionBenchmark t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22493 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22493: [SPARK-25485][TEST] Refactor UnsafeProjectionBenchmark t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22493 **[Test build #96439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96439/testReport)** for PR 22493 at commit [`db99409`](https://github.com/apache/spark/commit/db9940999e92c3bb0a8e1e5d8e234a837ee783b0). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22493: [SPARK-25485][TEST] Refactor UnsafeProjectionBenchmark t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22493 **[Test build #96438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96438/testReport)** for PR 22493 at commit [`96d382e`](https://github.com/apache/spark/commit/96d382ec1039418b7a7c82fb389dcd2bd8e32130). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22493: [SPARK-25485][TEST] Refactor UnsafeProjectionBenchmark t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22493 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22493: [SPARK-25485][TEST] Refactor UnsafeProjectionBenchmark t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22493 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96438/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22490: [SPARK-25481][TEST] Refactor ColumnarBatchBenchma...
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22490#discussion_r219548887 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchBenchmark.scala --- @@ -30,8 +30,13 @@ import org.apache.spark.util.collection.BitSet /** * Benchmark to low level memory access using different ways to manage buffers. + * To run this benchmark: --- End diff -- Oh, I see, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22495: [SPARK-25486][TEST] Refactor SortBenchmark to use main m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22495 **[Test build #96441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96441/testReport)** for PR 22495 at commit [`3943a7f`](https://github.com/apache/spark/commit/3943a7f7b9cfa8f389c765ef4870323c4b40ab05). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219546151 --- Diff: docs/sql-programming-guide.md --- @@ -1489,7 +1489,7 @@ See the [Apache Avro Data Source Guide](avro-data-source-guide.html). * The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java's DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs. * Some databases, such as H2, convert all names to upper case. You'll need to use upper case to refer to those names in Spark SQL. - + * Users can specify vendor-specific JDBC connection properties in the data source options to do special treatment. For example, `spark.read.format("jdbc").option("url", oracleJdbcUrl).option("oracle.jdbc.mapDateToTimestamp", "false")`. `oracle.jdbc.mapDateToTimestamp` defaults to true, users often need to disable this flag to avoid Oracle date being resolved as timestamp. --- End diff -- This looks fine to me. @maropu Your idea is great! We need more examples in the troubleshooting. Currently, our document for Spark SQL and Core needs a major update. Maybe we can do it after we finish the reorg of the documentation. https://issues.apache.org/jira/browse/SPARK-24499? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22493: [SPARK-25485][TEST] Refactor UnsafeProjectionBenchmark t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22493 **[Test build #96439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96439/testReport)** for PR 22493 at commit [`db99409`](https://github.com/apache/spark/commit/db9940999e92c3bb0a8e1e5d8e234a837ee783b0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22488 **[Test build #96440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96440/testReport)** for PR 22488 at commit [`71dfe03`](https://github.com/apache/spark/commit/71dfe03374466a780988a2d0ca3c6bc8cbdd11fd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22488 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22488 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3353/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22493: [SPARK-25485][TEST] Refactor UnsafeProjectionBenchmark t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22493 **[Test build #96438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96438/testReport)** for PR 22493 at commit [`96d382e`](https://github.com/apache/spark/commit/96d382ec1039418b7a7c82fb389dcd2bd8e32130). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22490: [SPARK-25481][TEST] Refactor ColumnarBatchBenchma...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22490#discussion_r219543405 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchBenchmark.scala --- @@ -30,8 +30,13 @@ import org.apache.spark.util.collection.BitSet /** * Benchmark to low level memory access using different ways to manage buffers. + * To run this benchmark: --- End diff -- Could you update scala doc to: https://github.com/apache/spark/blob/d25f425c9652a3611dd5fea8a37df4abb13e126e/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala#L36-L41 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22467 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3352/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22467 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22467 **[Test build #96437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96437/testReport)** for PR 22467 at commit [`813d19c`](https://github.com/apache/spark/commit/813d19c63477b82a76bdd0d1da73cf3cb1d38564). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22519: [SPARK-25505][SQL] The output order of grouping columns ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22519 **[Test build #96436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96436/testReport)** for PR 22519 at commit [`bd416bd`](https://github.com/apache/spark/commit/bd416bd74ee77329b2527fffecd21f7f90090334). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22519: [SPARK-25505][SQL] The output order of grouping columns ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22519 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3351/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219542061 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -462,6 +464,9 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-04 03:30:00.0") .option("upperBound", "2018-07-27 14:11:05.0") .option("numPartitions", 2) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- I see. We still have a date column in the input dataframe. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22519: [SPARK-25505][SQL] The output order of grouping columns ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22519 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22467 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org