[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211485248 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -626,6 +626,7 @@ object DataSource extends Logging { serviceLoader.asScala.filter(_.shortName().equalsIgnoreCase(provider1)).toList match { // the provider format did not match any given registered aliases case Nil => + val latestDocsURL = "https://spark.apache.org/docs/latest; --- End diff -- The doc will be like https://github.com/apache/spark/pull/22121/files#diff-acdddc6cbd45ccd226bf151564b9cc40R11 It is about loading the module with `--package` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22158: [SPARK-25161][Core] Fix several bugs in failure handling...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22158 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2357/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22158: [SPARK-25161][Core] Fix several bugs in failure handling...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22158 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22148 Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22158: [SPARK-25161][Core] Fix several bugs in failure handling...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22158 **[Test build #94998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94998/testReport)** for PR 22158 at commit [`32ea946`](https://github.com/apache/spark/commit/32ea946c68c5f3108fb18f7e936ba440f7537144). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2356/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22158: [SPARK-25161][Core] Fix several bugs in failure handling...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/22158 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22153 **[Test build #94997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94997/testReport)** for PR 22153 at commit [`e237e39`](https://github.com/apache/spark/commit/e237e3944fa5839e5fa17b07af7901ac56655a4b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22112 > always return the same result with same order when rerun.. maybe the word "idempotent" is not that accurate. Spark doesn't really care about the order, so the requirement is, for the same input data set, it should return the same output set. As an example, `iter1.zip(iter2)` will be treated as invalid, unless we sort before zip. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on the issue: https://github.com/apache/spark/pull/20838 Thanks massively for this. I doubt that I _ever_ would have gotten to that on my own. This is a test so my proposal would be that _you create a separate PR_ so that we are all assured that it passes in the current codebase. Once that PR has been merged, I can come back and finish this PR. Thanks again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211482746 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -626,6 +626,7 @@ object DataSource extends Logging { serviceLoader.asScala.filter(_.shortName().equalsIgnoreCase(provider1)).toList match { // the provider format did not match any given registered aliases case Nil => + val latestDocsURL = "https://spark.apache.org/docs/latest; --- End diff -- I mean, if we happen to have Spark 3.0.0 then this link will be stale in 2.4.0.. no? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22133: [SPARK-25129][SQL]Make the mapping of com.databricks.spa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22133 **[Test build #94996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94996/testReport)** for PR 22133 at commit [`e57b232`](https://github.com/apache/spark/commit/e57b232ec8e36ea107ce103e5cdb6efaa0756c40). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22133: [SPARK-25129][SQL]Make the mapping of com.databricks.spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22133 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22133: [SPARK-25129][SQL]Make the mapping of com.databricks.spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22133 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2355/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22165 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2354/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211482547 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -593,7 +592,6 @@ object DataSource extends Logging { "org.apache.spark.ml.source.libsvm.DefaultSource" -> libsvm, "org.apache.spark.ml.source.libsvm" -> libsvm, "com.databricks.spark.csv" -> csv, - "com.databricks.spark.avro" -> avro, --- End diff -- Ah okie makes sense if there's a reason. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22165 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211482461 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -626,6 +626,7 @@ object DataSource extends Logging { serviceLoader.asScala.filter(_.shortName().equalsIgnoreCase(provider1)).toList match { // the provider format did not match any given registered aliases case Nil => + val latestDocsURL = "https://spark.apache.org/docs/latest; --- End diff -- This is the link for the latest doc. I think it should be ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94995/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22153 **[Test build #94995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94995/testReport)** for PR 22153 at commit [`732bc5f`](https://github.com/apache/spark/commit/732bc5f5d049c93f40d01926ac1efe8495e27b58). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211482149 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -593,7 +592,6 @@ object DataSource extends Logging { "org.apache.spark.ml.source.libsvm.DefaultSource" -> libsvm, "org.apache.spark.ml.source.libsvm" -> libsvm, "com.databricks.spark.csv" -> csv, - "com.databricks.spark.avro" -> avro, --- End diff -- @HyukjinKwon I did add it in the `backwardCompatibilityMap` at first. But later on I find that the configuration won't be effective in run time, since the `backwardCompatibilityMap` is a `val`. (We can change `backwardCompatibilityMap` to method to resolve that.) Also the code looks ugly. ``` val ret = Map(...) if(...) { ret + k -> v } else { ret } // it would be worse if we have more configurations. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22165 **[Test build #94994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94994/testReport)** for PR 22165 at commit [`21bd1c3`](https://github.com/apache/spark/commit/21bd1c37f4af6480adfc07130a15f70acdeda378). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2353/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22153 **[Test build #94995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94995/testReport)** for PR 22153 at commit [`732bc5f`](https://github.com/apache/spark/commit/732bc5f5d049c93f40d01926ac1efe8495e27b58). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...
GitHub user xuanyuanking opened a pull request: https://github.com/apache/spark/pull/22165 [SPARK-25017][Core] Add test suite for BarrierCoordinator and ContextBarrierState ## What changes were proposed in this pull request? Currently `ContextBarrierState` and `BarrierCoordinator` are only covered by end-to-end test in `BarrierTaskContextSuite`, add BarrierCoordinatorSuite to test both classes. ## How was this patch tested? UT in BarrierCoordinatorSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuanyuanking/spark SPARK-25017 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22165.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22165 commit 21bd1c37f4af6480adfc07130a15f70acdeda378 Author: liyuanjian Date: 2018-08-21T05:24:07Z [SPARK-25017][Core] Add test suite for BarrierCoordinator and ContextBarrierState --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20838 Hi @cclauss , sorry for the frustration. I looked into the test, and it was kind of a pain to get it working right - which is probably why it wasn't done in the first place ;) Here are my modifications for `test_slice` and it seems to pass py3 fine ```python def test_slice(self): """Basic operation test for DStream.slice.""" import datetime as dt self.ssc = StreamingContext(self.sc, 1.0) self.ssc.remember(4.0) input = [[1], [2], [3], [4]] stream = self.ssc.queueStream([self.sc.parallelize(d, 1) for d in input]) time_vals = [] def get_times(t, rdd): if rdd and len(time_vals) < len(input): time_vals.append(t) stream.foreachRDD(get_times) self.ssc.start() self.wait_for(time_vals, 4) begin_time = time_vals[0] def get_sliced(begin_delta, end_delta): begin = begin_time + dt.timedelta(seconds=begin_delta) end = begin_time + dt.timedelta(seconds=end_delta) rdds = stream.slice(begin, end) result_list = [rdd.collect() for rdd in rdds] return [r for result in result_list for r in result] self.assertEqual(set([1]), set(get_sliced(0, 0))) self.assertEqual(set([2, 3]), set(get_sliced(1, 2))) self.assertEqual(set([2, 3, 4]), set(get_sliced(1, 4))) self.assertEqual(set([1, 2, 3, 4]), set(get_sliced(0, 4))) ``` If you want to put that in, I have some time now and can help you get this merged or if you prefer I can finish it up and still assign to you. ```p --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94986/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22153 **[Test build #94986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94986/testReport)** for PR 22153 at commit [`e0c048e`](https://github.com/apache/spark/commit/e0c048e34635d60e5d7eeb391ea2046727e2fd35). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22161: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22161 @HyukjinKwon Done. [SPARK-25167](https://issues.apache.org/jira/browse/SPARK-25167) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22164 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94991/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22164 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22164 **[Test build #94991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94991/testReport)** for PR 22164 at commit [`da33554`](https://github.com/apache/spark/commit/da33554cc38d4b41e86dcb6e2c833f5b29c35ad8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22163 **[Test build #94993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94993/testReport)** for PR 22163 at commit [`bcef61e`](https://github.com/apache/spark/commit/bcef61e3c1e65e797c8044b674c5ae99c89ce222). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2352/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22154 **[Test build #94992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94992/testReport)** for PR 22154 at commit [`5f0ff13`](https://github.com/apache/spark/commit/5f0ff13cc5c30d99fa77551fd617783c29e4864b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22112 If "always return the same result with same order when rerun." is the definition of "idempotent", then yes, MLlib RDD closures always returns the same result if the input doesn't change. We use pseudo-randomness to achieve deterministic behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2351/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22154 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94985/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22154 **[Test build #94985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94985/testReport)** for PR 22154 at commit [`5f0ff13`](https://github.com/apache/spark/commit/5f0ff13cc5c30d99fa77551fd617783c29e4864b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22164 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2350/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94983/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22164 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22154 **[Test build #94983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94983/testReport)** for PR 22154 at commit [`e3e86c6`](https://github.com/apache/spark/commit/e3e86c645d5c75c1c490881564ec7ea4f909d2ee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22164 **[Test build #94991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94991/testReport)** for PR 22164 at commit [`da33554`](https://github.com/apache/spark/commit/da33554cc38d4b41e86dcb6e2c833f5b29c35ad8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/22164 [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA scenario ## What changes were proposed in this pull request? YARN `AmIpFilter` adds a new parameter "RM_HA_URLS" to support RM HA, but Spark on YARN doesn't provide a such parameter, so it will be failed to redirect when running on RM HA. The detailed exception can be checked from JIRA. So here fixing this issue by adding "RM_HA_URLS" parameter. ## How was this patch tested? Local verification. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-23679 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22164.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22164 commit da33554cc38d4b41e86dcb6e2c833f5b29c35ad8 Author: jerryshao Date: 2018-08-20T08:28:13Z Fix AmIpFilter cannot work in RM HA scenario --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22156: [SPARK-25144][SQL][TEST][BRANCH-2.2] Free aggregate map ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22156 Thank you, @HyukjinKwon . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22156: [SPARK-25144][SQL][TEST][BRANCH-2.2] Free aggrega...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/22156 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22155: [SPARK-25144][SQL][TEST] Free aggregate map when task en...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22155 Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/22163 The current buffer is `writeBuffer`, I mean copying `writeBuffer` to 'diskWriteBuffer' or other buffer --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/22163 The current buffer is `writeBuffer`. I mean copying `writeBuffer` to `diskWriteBuffer` or other buffer --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21859 If this optimization is done more generally, will the implicitly cached data cause memory pressure on driver, as seems we don't have way to release them? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20637#discussion_r211468226 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -43,25 +45,30 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro case _ => false } - // TODO: if the nullability of field is correct, we can use it to save null check. private def writeStructToBuffer( ctx: CodegenContext, input: String, index: String, - fieldTypes: Seq[DataType], + fieldTypeAndNullables: Seq[Schema], rowWriter: String): String = { // Puts `input` in a local variable to avoid to re-evaluate it if it's a statement. val tmpInput = ctx.freshName("tmpInput") -val fieldEvals = fieldTypes.zipWithIndex.map { case (dt, i) => - ExprCode( -JavaCode.isNullExpression(s"$tmpInput.isNullAt($i)"), -JavaCode.expression(CodeGenerator.getValue(tmpInput, dt, i.toString), dt)) +val fieldEvals = fieldTypeAndNullables.zipWithIndex.map { case (dtNullable, i) => + val isNull = if (dtNullable.nullable) { +JavaCode.isNullExpression(s"$tmpInput.isNullAt($i)") + } else { +FalseLiteral + } + ExprCode(isNull, JavaCode.expression( +CodeGenerator.getValue(tmpInput, dtNullable.dataType, i.toString), dtNullable.dataType)) } val rowWriterClass = classOf[UnsafeRowWriter].getName val structRowWriter = ctx.addMutableState(rowWriterClass, "rowWriter", v => s"$v = new $rowWriterClass($rowWriter, ${fieldEvals.length});") val previousCursor = ctx.freshName("previousCursor") +val structExpressions = writeExpressionsToBuffer( + ctx, tmpInput, fieldEvals, fieldTypeAndNullables.map(_.dataType), structRowWriter) --- End diff -- I see here, but another call of `writeExpressionsToBuffer` from `createCode` should pass nullable to `writeExpressionsToBuffer` because `exprEvals.isNull` there is not always `FalseLiteral` even if an expression is non-nullable? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22163 What you mean `only one record is written to a buffer each time`? Isn't it controlled by `diskWriteBufferSize` to write such size of data each time? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94984/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20345 **[Test build #94984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94984/testReport)** for PR 20345 at commit [`39462fb`](https://github.com/apache/spark/commit/39462fbee952ec574b4c04d7718fd73bb5f56d9d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22140 cc @BryanCutler as well since we discussed an issue about this code path before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 'The ShuffleWriter should treat RangePartitioner specially and consume the sampled data in RangePartitioner instead of the input iterator.' This idea is good, maybe we can cache both the K and V when doing sample. I will have a try on this idea. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22163 **[Test build #94989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94989/testReport)** for PR 22163 at commit [`671268b`](https://github.com/apache/spark/commit/671268b679f9221fd96e9ab2ea929df4a9908de8). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94989/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21859 **[Test build #94990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94990/testReport)** for PR 21859 at commit [`6f52f1f`](https://github.com/apache/spark/commit/6f52f1fde3d4df9384e1c99d08b930953843bcde). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 I read the source code again. The RangePartitioner[K, V] in ShuffleExchangeExec is an instance of RangePartitioner[InternalRow, Null]. RangePartitioner only sample K for getting the rangeBounds. So We can get the InternalRow when doing sample. After getting the RangePartitioner, the ShuffleExchangeExec will map the InternalRow to [partitionId, InternalRow] for shuffle (the RangePartitioner generates the partitionId). The shuffle won't use the RangePartitioner, it will use PartitionIdPassthrough instead. In other words, the ShuffleWriter won't know the RangePartitioner's existence. ``` val rddWithPartitionIds: RDD[Product2[Int, InternalRow]] = newRdd.mapPartitionsInternal { iter => val getPartitionKey = getPartitionKeyExtractor() val mutablePair = new MutablePair[Int, InternalRow]() iter.map { row => mutablePair.update(part.getPartition(getPartitionKey(row)), row) } } val dependency = new ShuffleDependency[Int, InternalRow, InternalRow]( rddWithPartitionIds, new PartitionIdPassthrough(part.numPartitions), serializer) private class PartitionIdPassthrough(override val numPartitions: Int) extends Partitioner { override def getPartition(key: Any): Int = key.asInstanceOf[Int] } ``` The optimization will parallelize the cached InternalRow to the newRdd instead of getting it again. But in other places, like rdd's sortByKey ``` def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length) : RDD[(K, V)] = self.withScope { val part = new RangePartitioner(numPartitions, self, ascending) new ShuffledRDD[K, V, V](self, part) .setKeyOrdering(if (ascending) ordering else ordering.reverse) } // getDependencies function in ShuffledRDD override def getDependencies: Seq[Dependency[_]] = { val serializer = userSpecifiedSerializer.getOrElse { val serializerManager = SparkEnv.get.serializerManager if (mapSideCombine) { serializerManager.getSerializer(implicitly[ClassTag[K]], implicitly[ClassTag[C]]) } else { serializerManager.getSerializer(implicitly[ClassTag[K]], implicitly[ClassTag[V]]) } } List(new ShuffleDependency(prev, part, serializer, keyOrdering, aggregator, mapSideCombine)) } ``` The rdd is [K, V], and the shuffle uses RangePartitioner directly. But we can only get K when doing sample. so we can't restore the rdd using the cache. They work in two different ways. So the optimization only works in Spark Sql's ShuffleExchangeExec by now. 'The ShuffleWriter should treat RangePartitioner specially and consume the sampled data in RangePartitioner instead of the input iterator.' This idea is good, maybe we can cache both the K and V when doing sample. I will have a try. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/22065 This is end-to-end performance improvement, although our data is very small. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22157: [SPARK-25126] Avoid creating Reader for all orc files
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22157 > Do we have a similar issue for Parquet? Looks not since we explicitly pick up one file before reading in schema inference: https://github.com/apache/spark/blob/f984ec75ed6162ee6f5881716a8311c883aca22a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L229-L239 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22163 **[Test build #94989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94989/testReport)** for PR 22163 at commit [`671268b`](https://github.com/apache/spark/commit/671268b679f9221fd96e9ab2ea929df4a9908de8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2349/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22112 So there are 2 options: 1. ask the RDD closure to be idempotent. I'm not sure if it's OK for MLlib, cc @mengxr @WeichenXu123 @yanboliang 2. ask the output committer to be able to overwrite a committed task. Note that, the output committer here is the `FileCommitProtocol` interface in Spark, not the hadoop output committer. We don't have to make it all the hadoop output committers work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22163: [SPARK-25166][CORE]Reduce the number of write ope...
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/22163 [SPARK-25166][CORE]Reduce the number of write operations for shuffle write. ## What changes were proposed in this pull request? Currently, only one record is written to a buffer each time, which increases the number of copies. I think we should write as many records as possible each time. ## How was this patch tested? Existed unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark reducewrite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22163.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22163 commit 671268b679f9221fd96e9ab2ea929df4a9908de8 Author: liuxian Date: 2018-08-21T02:42:30Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22112 **[Test build #94988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94988/testReport)** for PR 22112 at commit [`4f8e24d`](https://github.com/apache/spark/commit/4f8e24d33e6df2c60740a6c4d0ebec4db4123f5b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2348/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22138 @koeninger Yeah I see what you're saying, then IMHO isolating consumers with query sounds better than others. Adding next offset to the cache key would make consumer moving bucket in cache every time it is processed, which is not expected behavior for general pool solution and we have to reinvent the wheel (and it is not ideal situation for caching, too). There's an evict thread in Apache Commons Pool running on background, and we could close consumers being idle for a long time (say 5 mins or higher). That's another benefit of adopting Apache Commons Pool (maybe available for most of general pool solutions): we could also evict cached consumers eventually which topic or partition is removed while query is running. It is not only evicted because of exceeding cache, but also time of inactivity. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22161: [SPARKR][TEST][MINOR] Minor fixes for R sql tests
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22161 Eh, @dilipbiswal, actually can we file a JIRA? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94982/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22148: [SPARK-25132][SQL] Case-insensitive field resolut...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22148 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21306 **[Test build #94982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94982/testReport)** for PR 21306 at commit [`6b45a11`](https://github.com/apache/spark/commit/6b45a119df8e6382fa2503f854b4a85aed3e3785). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` abstract static class SingleColumnTransform implements PartitionTransform ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22148 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21320 **[Test build #94987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94987/testReport)** for PR 21320 at commit [`97b3a51`](https://github.com/apache/spark/commit/97b3a51d478f19890ded73aa78d94c055a9f144c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21320 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2347/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21320 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21320 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22123: [SPARK-25134][SQL] Csv column pruning with checki...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22123: [SPARK-25134][SQL] Csv column pruning with checking of h...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22123 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94979/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22133: [SPARK-25129][SQL]Make the mapping of com.databricks.spa...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22133 Seems fine otherwise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21669 **[Test build #94979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94979/testReport)** for PR 21669 at commit [`4a000d2`](https://github.com/apache/spark/commit/4a000d2abda968a28f419d21418f61e2f53355fc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211460921 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -626,6 +626,7 @@ object DataSource extends Logging { serviceLoader.asScala.filter(_.shortName().equalsIgnoreCase(provider1)).toList match { // the provider format did not match any given registered aliases case Nil => + val latestDocsURL = "https://spark.apache.org/docs/latest; --- End diff -- I would actually avoid to leave the explicit doc link because we will have to fix it for every release. Just prose should be good enough. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org