[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21413 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3689/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21427 I guess sending configurations is not that difficult. We can write configs (as `Map[String, String]` for further configurations in the future?) before `PythonUDFRunner.writeUDFs(dataOut, funcs, argOffsets)` in `ArrowPythonRunner.writeCommand()` (and `PythonUDFRunner.writeCommand()`?), and read them before read udfs at `worker.py`. The `timezone` can be included in the configs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3555/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Yup, my impression was that there could be a corner case too but I wasn't sure how much the corner case makes sense, and haven't checked it closelt yet. I believe elaborating the case might be helpful to judge we should block this or now. The current approach looks fine in general to me though. I think it's fine if it's a bit of behaviour change as long as we mention it in the migration guide cc @cloud-fan too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3555/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3688/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21437 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21437 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91274/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21437 **[Test build #91274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91274/testReport)** for PR 21437 at commit [`9d95c12`](https://github.com/apache/spark/commit/9d95c12a0ada0520f426723406a7d99aada2760d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21413 **[Test build #91282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91282/testReport)** for PR 21413 at commit [`d8f3906`](https://github.com/apache/spark/commit/d8f3906be4d4178d3c41bff41eaeb39f430ade6b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r191611779 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,22 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): +""" +Sets the value of :py:attr:`featureSubsetStrategy`. + +.. note:: Deprecated in 2.1.0 and will be removed in 2.4.0. --- End diff -- Sorry. Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91281/testReport)** for PR 21366 at commit [`5b9c00f`](https://github.com/apache/spark/commit/5b9c00fa39d1c83435ca65de5394345e5d6f1f00). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191611421 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- Thanks @BryanCutler for catching this stupid stuff. not at comfortable with python. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91273/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21288#discussion_r191610297 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -131,211 +132,214 @@ object FilterPushdownBenchmark { } /* +OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized8452 / 8504 1.9 537.3 1.0X -Parquet Vectorized (Pushdown) 274 / 281 57.3 17.4 30.8X -Native ORC Vectorized 8167 / 8185 1.9 519.3 1.0X -Native ORC Vectorized (Pushdown) 365 / 379 43.1 23.2 23.1X +Parquet Vectorized2961 / 3123 5.3 188.3 1.0X +Parquet Vectorized (Pushdown) 3057 / 3121 5.1 194.4 1.0X --- End diff -- I have not tried it yet, but is it related to the recent change we made in the parquet reader? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91273/testReport)** for PR 21366 at commit [`b30ed39`](https://github.com/apache/spark/commit/b30ed39ebecc72cadfc9ec20b135d60f618762a4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r191609540 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,22 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): +""" +Sets the value of :py:attr:`featureSubsetStrategy`. + +.. note:: Deprecated in 2.1.0 and will be removed in 2.4.0. --- End diff -- sorry, this should be `.. note:: Deprecated in 2.4.0 and will be removed in 3.0.0.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21451 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21451 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91271/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21453 Jenkins, add to whitelist. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21453 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21451 **[Test build #91271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91271/testReport)** for PR 21451 at commit [`68c5d5f`](https://github.com/apache/spark/commit/68c5d5f5f60da7cbc0ce356acd8e5ab31db70ea5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21454 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21442#discussion_r191607288 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -219,10 +219,15 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] { object OptimizeIn extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsDown { - case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral + case In(v, list) if list.isEmpty => +// When v is not nullable, the following expression will be optimized +// to FalseLiteral which is tested in OptimizeInSuite.scala +If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType)) case expr @ In(v, list) if expr.inSetConvertible => val newList = ExpressionSet(list).toSeq -if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) { +if (newList.length == 1) { + EqualTo(v, newList.head) --- End diff -- This will fail, since the schema mismatches when the data type is struct. The test cases were added a few days ago. https://github.com/apache/spark/pull/21425 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21427 I'm sorry for the late review, but I think the current fix is still behavior change.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21413 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21413 **[Test build #91280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91280/testReport)** for PR 21413 at commit [`714ab33`](https://github.com/apache/spark/commit/714ab3338f952c13c1a306b50bb967c605a38076). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21413 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91280/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21413 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3687/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21413 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21413 **[Test build #91280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91280/testReport)** for PR 21413 at commit [`714ab33`](https://github.com/apache/spark/commit/714ab3338f952c13c1a306b50bb967c605a38076). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r191602398 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,22 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): +""" +Sets the value of :py:attr:`featureSubsetStrategy`. + +.. note:: Deprecated in 2.1.0 and will be removed in 3.0.0. --- End diff -- Fixed. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user PenguinToast commented on the issue: https://github.com/apache/spark/pull/21454 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21454 **[Test build #91279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91279/testReport)** for PR 21454 at commit [`badbf0e`](https://github.com/apache/spark/commit/badbf0e6766a99565e061063041f231d119d6d3a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21346 So, one thing that I was thinking about is whether it would be worth it to make error handling a little better here. I think this is no worse than the current status quo, and looking at the related PR I'm not sure how much better this would make things, but... The current implementation sends a "header" message + the streamed payload as a single RPC, so there's a single opportunity for the receiver to return an error. That means that if, for example, the receiver does not have enough space to store a block that is being uploaded, it can return an error, but the sender will still try to send all the block data to the receiver (which will just ignore it). I'm wondering if it would be worth to try to implement this as a couple of "chained RPCs", one that sends the metadata and a second one that streams the data. That way the receiver can error out on the first RPC and the sender can just throw away the second RPC, instead of having to transfer everything. It might create the "some state needs to be stored somewhere" problem on the receiver side, though. Haven't really thought that far yet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21437 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3686/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21437 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21437 **[Test build #91277 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91277/testReport)** for PR 21437 at commit [`2ea9cbc`](https://github.com/apache/spark/commit/2ea9cbc80787f1417fa4502c3c2b9b89f46d0632). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle writer fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21428 **[Test build #91278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91278/testReport)** for PR 21428 at commit [`65837ac`](https://github.com/apache/spark/commit/65837ac611991f2db7710d0657e56b222a2f5c74). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21454 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91270/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r191596882 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleSuite.scala --- @@ -40,22 +60,129 @@ class ContinuousShuffleReadSuite extends StreamTest { messages.foreach(endpoint.askSync[Unit](_)) } - // In this unit test, we emulate that we're in the task thread where - // ContinuousShuffleReadRDD.compute() will be evaluated. This requires a task context - // thread local to be set. - var ctx: TaskContextImpl = _ + private def readRDDEndpoint(rdd: ContinuousShuffleReadRDD) = { +rdd.partitions(0).asInstanceOf[ContinuousShuffleReadPartition].endpoint + } - override def beforeEach(): Unit = { -super.beforeEach() -ctx = TaskContext.empty() -TaskContext.setTaskContext(ctx) + private def readEpoch(rdd: ContinuousShuffleReadRDD) = { +rdd.compute(rdd.partitions(0), ctx).toSeq.map(_.getInt(0)) } - override def afterEach(): Unit = { -ctx.markTaskCompleted(None) -TaskContext.unset() -ctx = null -super.afterEach() + test("one epoch") { --- End diff -- Reordered. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21454 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191596607 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- Right. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21454 **[Test build #91270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91270/testReport)** for PR 21454 at commit [`f198f28`](https://github.com/apache/spark/commit/f198f28b1a3d7380a09e5687438a264101cc6965). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21427#discussion_r191596459 --- Diff: python/pyspark/worker.py --- @@ -111,9 +114,16 @@ def wrapped(key_series, value_series): "Number of columns of the returned pandas.DataFrame " "doesn't match specified schema. " "Expected: {} Actual: {}".format(len(return_type), len(result.columns))) -arrow_return_types = (to_arrow_type(field.dataType) for field in return_type) -return [(result[result.columns[i]], arrow_type) -for i, arrow_type in enumerate(arrow_return_types)] +try: +# Assign result columns by schema name +return [(result[field.name], to_arrow_type(field.dataType)) for field in return_type] +except KeyError: +if all(not isinstance(name, basestring) for name in result.columns): +# Assign result columns by position if they are not named with strings +return [(result[result.columns[i]], to_arrow_type(field.dataType)) +for i, field in enumerate(return_type)] +else: +raise --- End diff -- Ah, I saw you add document for this behavior. Looks good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191594326 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) --- End diff -- Do we need `u` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191594348 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191593987 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: +vertical = False +return self._jdf.showString( +console_row, console_truncate, vertical) +else: +return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) + +def _repr_html_(self): +"""Returns a dataframe with html code when you enabled eager evaluation +by 'spark.sql.repl.eagerEval.enabled', this only called by REPL you're +using support eager evaluation with HTML. +""" +import cgi +if not self._support_repr_html: +self._support_repr_html = True +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if eager_eval: +with SCCallSiteSync(self._sc) as css: +vertical = False +sock_info = self._jdf.getRowsToPython( +console_row, console_truncate, vertical) --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191591921 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) --- End diff -- How about declaring those as `@property`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191591799 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: --- End diff -- What's `_support_repr_html` for? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191593927 --- Diff: python/pyspark/sql/dataframe.py --- @@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False): else: print(self._jdf.showString(n, int(truncate), vertical)) +def _get_repl_config(self): +"""Return the configs for eager evaluation each time when __repr__ or +_repr_html_ called by user or notebook. +""" +eager_eval = self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.enabled", "false").lower() == "true" +console_row = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.maxNumRows", u"20")) +console_truncate = int(self.sql_ctx.getConf( +"spark.sql.repl.eagerEval.truncate", u"20")) +return (eager_eval, console_row, console_truncate) + def __repr__(self): -return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes)) +(eager_eval, console_row, console_truncate) = self._get_repl_config() +if not self._support_repr_html and eager_eval: +vertical = False +return self._jdf.showString( +console_row, console_truncate, vertical) --- End diff -- I guess ```python return self._jdf.showString( console_row, console_truncate, vertical=False) ``` should work without `vertical` variable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191591455 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -291,37 +289,57 @@ class Dataset[T] private[sql]( } } + rows = rows.map { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell, colWidths(i)) + } else { +StringUtils.rightPad(cell, colWidths(i)) + } +} + } --- End diff -- We should do this in `showString`? And we can move `minimumColWidth` into the `showString` in that case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r191595442 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -231,16 +234,17 @@ class Dataset[T] private[sql]( } /** - * Compose the string representing rows for output + * Get rows represented in Sequence by specific truncate and vertical requirement. * - * @param _numRows Number of rows to show + * @param numRows Number of rows to return * @param truncate If set to more than 0, truncates strings to `truncate` characters and * all cells will be aligned right. - * @param vertical If set to true, prints output rows vertically (one line per column value). + * @param vertical If set to true, the rows to return don't need truncate. */ - private[sql] def showString( - _numRows: Int, truncate: Int = 20, vertical: Boolean = false): String = { -val numRows = _numRows.max(0).min(Int.MaxValue - 1) --- End diff -- Don't we need to check the `numRows` range when called from `getRowsToPython`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21442#discussion_r191595951 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -219,10 +219,15 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] { object OptimizeIn extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsDown { - case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral + case In(v, list) if list.isEmpty => +// When v is not nullable, the following expression will be optimized +// to FalseLiteral which is tested in OptimizeInSuite.scala +If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType)) case expr @ In(v, list) if expr.inSetConvertible => val newList = ExpressionSet(list).toSeq -if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) { +if (newList.length == 1) { --- End diff -- Sounds good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21450 This doesn't seem to be addressing the issue reported in the bug. The exact same error happens with your patch: ``` $ ./bin/run-example Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource. at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:185) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:300) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:166) at org.apache.spark.launcher.Main.main(Main.java:86) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21450 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21450 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91267/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191589537 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- If it's missing it will result in a `KeyError`, maybe you want `return self._localProperties.get(key)` which returns `None` as the default? That seems better to me too, although you might want to add an optional `default` value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21450 **[Test build #91267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91267/testReport)** for PR 21450 at commit [`a69850b`](https://github.com/apache/spark/commit/a69850b6fdcbe2e234e70a597d9ad6beae6a6937). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21457: [SPARK-24414][ui] Calculate the correct number of tasks ...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21457 +1 pending sparkQa, changes look good, and I manually verified against both the jira use cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91275/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #91275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91275/testReport)** for PR 21427 at commit [`e322e1a`](https://github.com/apache/spark/commit/e322e1a1caa6cf422ed8b33244656345e4c13bb3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21442#discussion_r191585661 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -219,10 +219,15 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] { object OptimizeIn extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsDown { - case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral + case In(v, list) if list.isEmpty => +// When v is not nullable, the following expression will be optimized +// to FalseLiteral which is tested in OptimizeInSuite.scala +If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType)) case expr @ In(v, list) if expr.inSetConvertible => val newList = ExpressionSet(list).toSeq -if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) { +if (newList.length == 1) { + EqualTo(v, newList.head) +} else if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) { val hSet = newList.map(e => e.eval(EmptyRow)) InSet(v, HashSet() ++ hSet) } else if (newList.size < list.size) { --- End diff -- nit: In line 235 the comment ```// newList.length == list.length``` can be updated as ```// newList.length == list.length && newList.length > 1``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21454 IIUC this PR print the config key in the error message if the config value(either default or get from the configMap) can't be cast properly. Personally I think it add some value to include this change. I only have some nits. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21442#discussion_r191585050 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -219,10 +219,15 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] { object OptimizeIn extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsDown { - case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral + case In(v, list) if list.isEmpty => +// When v is not nullable, the following expression will be optimized +// to FalseLiteral which is tested in OptimizeInSuite.scala +If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType)) case expr @ In(v, list) if expr.inSetConvertible => val newList = ExpressionSet(list).toSeq -if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) { +if (newList.length == 1) { + EqualTo(v, newList.head) +} else if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) { --- End diff -- nit: size => length because we use `length` in the previous `if` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21454: [SPARK-24337][Core] Improve error messages for Sp...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21454#discussion_r191584812 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -448,6 +473,20 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria */ private[spark] def getenv(name: String): String = System.getenv(name) + /** + * Wrapper method for get*() methods which require some specific value format. This catches + * any [[NumberFormatException]] or [[IllegalArgumentException]] and re-raises it with the + * incorrectly configured key in the exception message. + */ + private def catchIllegalArgument[T](key: String)(getValue: => T): T = { --- End diff -- According to what it actually does `catchIllegalArgument` don't seems to be a great name for this function, maybe `catchIllegalValue`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21454: [SPARK-24337][Core] Improve error messages for Sp...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21454#discussion_r191582665 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -394,23 +407,35 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria } - /** Get a parameter as an integer, falling back to a default if not set */ - def getInt(key: String, defaultValue: Int): Int = { + /** + * Get a parameter as an integer, falling back to a default if not set + * @throws IllegalArgumentException If the value can't be interpreted as an integer + */ + def getInt(key: String, defaultValue: Int): Int = catchIllegalArgument(key) { getOption(key).map(_.toInt).getOrElse(defaultValue) } - /** Get a parameter as a long, falling back to a default if not set */ - def getLong(key: String, defaultValue: Long): Long = { + /** + * Get a parameter as a long, falling back to a default if not set + * @throws IllegalArgumentException If the value can't be interpreted as an long + */ + def getLong(key: String, defaultValue: Long): Long = catchIllegalArgument(key) { getOption(key).map(_.toLong).getOrElse(defaultValue) } - /** Get a parameter as a double, falling back to a default if not set */ - def getDouble(key: String, defaultValue: Double): Double = { + /** + * Get a parameter as a double, falling back to a default if not ste + * @throws IllegalArgumentException If the value can't be interpreted as an double + */ + def getDouble(key: String, defaultValue: Double): Double = catchIllegalArgument(key) { getOption(key).map(_.toDouble).getOrElse(defaultValue) } - /** Get a parameter as a boolean, falling back to a default if not set */ - def getBoolean(key: String, defaultValue: Boolean): Boolean = { + /** + * Get a parameter as a boolean, falling back to a default if not set + * @throws IllegalArgumentException If the value can't be interpreted as an boolean --- End diff -- nit: `an boolean` -> `a boolean` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21454: [SPARK-24337][Core] Improve error messages for Sp...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21454#discussion_r191582611 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -394,23 +407,35 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria } - /** Get a parameter as an integer, falling back to a default if not set */ - def getInt(key: String, defaultValue: Int): Int = { + /** + * Get a parameter as an integer, falling back to a default if not set + * @throws IllegalArgumentException If the value can't be interpreted as an integer + */ + def getInt(key: String, defaultValue: Int): Int = catchIllegalArgument(key) { getOption(key).map(_.toInt).getOrElse(defaultValue) } - /** Get a parameter as a long, falling back to a default if not set */ - def getLong(key: String, defaultValue: Long): Long = { + /** + * Get a parameter as a long, falling back to a default if not set + * @throws IllegalArgumentException If the value can't be interpreted as an long + */ + def getLong(key: String, defaultValue: Long): Long = catchIllegalArgument(key) { getOption(key).map(_.toLong).getOrElse(defaultValue) } - /** Get a parameter as a double, falling back to a default if not set */ - def getDouble(key: String, defaultValue: Double): Double = { + /** + * Get a parameter as a double, falling back to a default if not ste + * @throws IllegalArgumentException If the value can't be interpreted as an double --- End diff -- nit: `an double` -> `a double` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21454: [SPARK-24337][Core] Improve error messages for Sp...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21454#discussion_r191582499 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -394,23 +407,35 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria } - /** Get a parameter as an integer, falling back to a default if not set */ - def getInt(key: String, defaultValue: Int): Int = { + /** + * Get a parameter as an integer, falling back to a default if not set + * @throws IllegalArgumentException If the value can't be interpreted as an integer + */ + def getInt(key: String, defaultValue: Int): Int = catchIllegalArgument(key) { getOption(key).map(_.toInt).getOrElse(defaultValue) } - /** Get a parameter as a long, falling back to a default if not set */ - def getLong(key: String, defaultValue: Long): Long = { + /** + * Get a parameter as a long, falling back to a default if not set + * @throws IllegalArgumentException If the value can't be interpreted as an long --- End diff -- nit: `an long` -> `a long` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21457: [SPARK-24414][ui] Calculate the correct number of tasks ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21457 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3685/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21457: [SPARK-24414][ui] Calculate the correct number of tasks ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21457 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21413#discussion_r191581932 --- Diff: python/pyspark/ml/regression.py --- @@ -619,6 +627,22 @@ def getSubsamplingRate(self): """ return self.getOrDefault(self.subsamplingRate) +@since("1.4.0") +def setFeatureSubsetStrategy(self, value): +""" +Sets the value of :py:attr:`featureSubsetStrategy`. + +.. note:: Deprecated in 2.1.0 and will be removed in 3.0.0. --- End diff -- This should technically be marked as deprecated in 2.4.0, even though the Scala version was before --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21409 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91269/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21409 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21409 **[Test build #91269 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91269/testReport)** for PR 21409 at commit [`e90fa00`](https://github.com/apache/spark/commit/e90fa00e8963eb985bdd30d9a262c61f6ca1ce61). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21457: [SPARK-24414][ui] Calculate the correct number of tasks ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21457 **[Test build #91276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91276/testReport)** for PR 21457 at commit [`40b6cb7`](https://github.com/apache/spark/commit/40b6cb7117598560d91bf6efb148c482eadd8daf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21457: [SPARK-24414][ui] Calculate the correct number of...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/21457 [SPARK-24414][ui] Calculate the correct number of tasks for a stage. This change takes into account all non-pending tasks when calculating the number of tasks to be shown. This also means that when the stage is pending, the task table (or, in fact, most of the data in the stage page) will not be rendered. I also fixed the label when the known number of tasks is larger than the recorded number of tasks (it was inverted). You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-24414 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21457.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21457 commit 40b6cb7117598560d91bf6efb148c482eadd8daf Author: Marcelo Vanzin Date: 2018-05-29T21:12:12Z [SPARK-24414][ui] Calculate the correct number of tasks for a stage. This change takes into account all non-pending tasks when calculating the number of tasks to be shown. This also means that when the stage is pending, the task table (or, in fact, most of the data in the stage page) will not be rendered. I also fixed the label when the known number of tasks is larger than the recorded number of tasks (it was inverted). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3684/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21453 Here is the issue in Scala side. https://github.com/scala/bug/issues/10913 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21427 **[Test build #91275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91275/testReport)** for PR 21427 at commit [`e322e1a`](https://github.com/apache/spark/commit/e322e1a1caa6cf422ed8b33244656345e4c13bb3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21453 I'm also looking at this issue. The challenge is that one of the hacks we use to initialize the Spark before REPL sees any files was removed in Scala 2.11.12. https://github.com/apache/spark/blob/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala#L109 We might need to work with Scala team to upgrade our Scala version. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...
Github user daniel-shields commented on the issue: https://github.com/apache/spark/pull/21449 This case can also occur when the datasets are different but share a common lineage. Consider the following: `df = spark.range(10) df1 = df.groupby('id').count() df2 = df.groupby('id').sum('id') df1.join(df2, df2['id'].eqNullSafe(df1['id'])).collect()` This currently fails with eqNullSafe, but works with ==. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21390 Are there any other concerns over this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21437 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3683/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21437 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3549/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21455: [SPARK-24093][DStream][Minor]Make some fields of KafkaSt...
Github user merlintang commented on the issue: https://github.com/apache/spark/pull/21455 @jerryshao can you review this minor update ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191567638 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/IntegrationTestBackend.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.k8s.integrationtest.backend + +import io.fabric8.kubernetes.client.DefaultKubernetesClient + +import org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend + +private[spark] trait IntegrationTestBackend { + def initialize(): Unit + def getKubernetesClient: DefaultKubernetesClient + def cleanUp(): Unit = {} +} + +private[spark] object IntegrationTestBackendFactory { + val DeployModeConfigKey = "spark.kubernetes.test.deployMode" --- End diff -- nit: lower case `d` in the var name --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r191568423 --- Diff: resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh --- @@ -0,0 +1,91 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +TEST_ROOT_DIR=$(git rev-parse --show-toplevel) +UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked" +IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt" +DEPLOY_MODE="minikube" +IMAGE_REPO="docker.io/kubespark" +IMAGE_TAG="N/A" +SPARK_TGZ="N/A" + +# Parse arguments +while (( "$#" )); do + case $1 in +--unpacked-spark-tgz) + UNPACKED_SPARK_TGZ="$2" + shift + ;; +--image-repo) + IMAGE_REPO="$2" + shift + ;; +--image-tag) + IMAGE_TAG="$2" + shift + ;; +--image-tag-output-file) + IMAGE_TAG_OUTPUT_FILE="$2" + shift + ;; +--deploy-mode) + DEPLOY_MODE="$2" + shift + ;; +--spark-tgz) + SPARK_TGZ="$2" + shift + ;; +*) + break + ;; + esac + shift +done + +if [[ $SPARK_TGZ == "N/A" ]]; +then + echo "Must specify a Spark tarball to build Docker images against with --spark-tgz." && exit 1; --- End diff -- Can we just use the repository and not require a tarball? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21437 **[Test build #91274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91274/testReport)** for PR 21437 at commit [`9d95c12`](https://github.com/apache/spark/commit/9d95c12a0ada0520f426723406a7d99aada2760d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3549/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3682/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21449 @daniel-shields in that case you have 2 different datasets `df1` and `df2`. So they are 2 distinct attributes and the check `a.sameRef(b)` would return false. This is applied only in case you have self-joins, ie. you have the same dataset on both sides. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21403 @juliuszsompolski yes, you're right, sorry, SPARK-24395 uses literal and not subqueries, sorry. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21450 cc @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org