[GitHub] spark issue #22246: [SPARK-25235] [SHELL] Merge the REPL code in Scala 2.11 ...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22246 Ping @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22246: [SPARK-25235] [SHELL] Merge the REPL code in Scal...
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/22246 [SPARK-25235] [SHELL] Merge the REPL code in Scala 2.11 and 2.12 branches ## What changes were proposed in this pull request? Using some reflection tricks to merge Scala 2.11 and 2.12 codebase. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dbtsai/spark repl Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22246.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22246 commit 8669c21e1dc97e660a13b1cad598d1dbe8e44731 Author: DB Tsai Date: 2018-08-24T00:39:07Z Consolidated Scala 2.11 and 2.12 branches commit 3808f02fdc2d914f7a022d00884034d8d8ceb19f Author: Liang-Chi Hsieh Date: 2018-08-27T11:26:29Z Get static loader object and invoke method on it. commit 075ca4a0c25503e4df4bc880f6ea58ead2eabcbe Author: DB Tsai Date: 2018-08-27T17:54:50Z Changed message --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22244 @cloud-fan Thanks! I will take a look later today and incorporate this with my patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2580/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r213057049 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand( val value = ExternalCatalogUtils.unescapePathName(ps(1)) if (resolver(columnName, partitionNames.head)) { scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value), -partitionNames.drop(1), threshold, resolver) +partitionNames.drop(1), threshold, resolver, listFilesInParallel = false) --- End diff -- cc @zsxwing had a few offline comments about the original PR for `parmap`. He will post them soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22173 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95300/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #95300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95300/testReport)** for PR 22173 at commit [`d86503c`](https://github.com/apache/spark/commit/d86503cf34f66d7082df8677e78f5f793e1064a0). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22173 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21976: [SPARK-24909][core] Always unregister pending par...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21976#discussion_r213056190 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2474,19 +2478,21 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi runEvent(makeCompletionEvent( taskSets(3).tasks(0), Success, makeMapStatus("hostB", 2))) -// There should be no new attempt of stage submitted, -// because task(stageId=1, stageAttempt=1, partitionId=1) is still running in -// the current attempt (and hasn't completed successfully in any earlier attempts). -assert(taskSets.size === 4) +// At this point there should be no active task set for stageId=1 and we need +// to resubmit because the output from (stageId=1, stageAttemptId=0, partitionId=1) +// was ignored due to executor failure +assert(taskSets.size === 5) +assert(taskSets(4).stageId === 1 && taskSets(4).stageAttemptId === 2 + && taskSets(4).tasks.size === 1) -// Complete task(stageId=1, stageAttempt=1, partitionId=1) successfully. +// Complete task(stageId=1, stageAttempt=2, partitionId=1) successfully. runEvent(makeCompletionEvent( - taskSets(3).tasks(1), Success, makeMapStatus("hostB", 2))) + taskSets(4).tasks(0), Success, makeMapStatus("hostB", 2))) --- End diff -- yes it will, marking either of these successful will work, but the assumption on line 2469 is that it got marked completed there by the tasksetmanager. So we don't want to send success for taskSet(3).task(1) as it should have already been marked success Unfortunately you can't test the interactions in this unit test, that is why I'm working on another scheduler integration test but was going to do that under separate jira. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22208 **[Test build #95301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95301/testReport)** for PR 22208 at commit [`01f9cd5`](https://github.com/apache/spark/commit/01f9cd5c0450ce35f7e91ebe7328cdee3e911441). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22173 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95299/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22173 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22245: [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22245 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #95299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95299/testReport)** for PR 22173 at commit [`50258f7`](https://github.com/apache/spark/commit/50258f7595a49373d64d8831ff3ce410eef6e0cf). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22245: [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22245 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95298/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22245: [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22245 **[Test build #95298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95298/testReport)** for PR 22245 at commit [`93c7bd9`](https://github.com/apache/spark/commit/93c7bd93f5dbec41a0fd4d6b5ef0bfe0bfdc235c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22226: [SPARK-25252][SQL] Support arrays of any types by to_jso...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/6 > Probably, you'd be better to file separate jira for each function. > +1 for separate JIRA. I created the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-25252 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22042 **[Test build #95297 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95297/testReport)** for PR 22042 at commit [`7a02921`](https://github.com/apache/spark/commit/7a02921950cda865e3cd45f1d1635212c2f707c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22042 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22042 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95297/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #95300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95300/testReport)** for PR 22173 at commit [`d86503c`](https://github.com/apache/spark/commit/d86503cf34f66d7082df8677e78f5f793e1064a0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r213050406 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand( val value = ExternalCatalogUtils.unescapePathName(ps(1)) if (resolver(columnName, partitionNames.head)) { scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value), -partitionNames.drop(1), threshold, resolver) +partitionNames.drop(1), threshold, resolver, listFilesInParallel = false) --- End diff -- I think the root cause is clear - fixed thread pool + submitting and waiting a future inside of another future from the the same thread pool. @gatorsmile I will revert parallel collection back here if you don't mind since there is no reasons for `parmap` in this place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r213050049 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +188,73 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( dataOut.writeInt(partitionIndex) // Python version of driver PythonRDD.writeUTF(pythonVer, dataOut) +// Init a ServerSocket to accept method calls from Python side. +val isBarrier = context.isInstanceOf[BarrierTaskContext] +if (isBarrier) { + serverSocket = Some(new ServerSocket(/* port */ 0, +/* backlog */ 1, +InetAddress.getByName("localhost"))) + // A call to accept() for ServerSocket shall block infinitely. + serverSocket.map(_.setSoTimeout(0)) + new Thread("accept-connections") { +setDaemon(true) + +override def run(): Unit = { + while (!serverSocket.get.isClosed()) { +var sock: Socket = null +try { + sock = serverSocket.get.accept() + // Wait for function call from python side. + sock.setSoTimeout(1) + val input = new DataInputStream(sock.getInputStream()) --- End diff -- Thanks for catching this, yea I agree it would be better to move the authentication before recognising functions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22238#discussion_r213049895 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f # Additional Information +**Gotchas** --- End diff -- IMO, It would be better to keep it here as well as in the code, we may not be able to surface it in the right api docs and chance for users to ignore it. @HeartSaVioR, may be add an example here to illustrate how to use the coalesce? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/22213#discussion_r213049701 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2062,8 +2062,10 @@ private[spark] object Utils extends Logging { try { val properties = new Properties() properties.load(inReader) - properties.stringPropertyNames().asScala.map( -k => (k, properties.getProperty(k).trim)).toMap + properties.stringPropertyNames().asScala +.map(k => (k, properties.getProperty(k))) --- End diff -- @jerryshao `trim` removes leading spaces as well that are totally legit. I also need more info regarding what you mean by ASCII in this context. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #95299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95299/testReport)** for PR 22173 at commit [`50258f7`](https://github.com/apache/spark/commit/50258f7595a49373d64d8831ff3ce410eef6e0cf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r213047538 --- Diff: docs/running-on-kubernetes.md --- @@ -185,6 +185,21 @@ To use a secret through an environment variable use the following options to the --conf spark.kubernetes.executor.secretKeyRef.ENV_NAME=name:key ``` +## Pod Template +Kubernetes allows defining pods from [template files](https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/#pod-templates). +Spark users can similarly use template files to define the driver or executor pod configurations that Spark configurations do not support. +To do so, specify the spark properties `spark.kubernetes.driver.podTemplateFile` and `spark.kubernetes.executor.podTemplateFile` +to point to local files accessible to the `spark-submit` process. To allow the driver pod access the executor pod template +file, the file will be automatically mounted onto a volume in the driver pod when it's created. + +It is important to note that Spark is opinionated about certain pod configurations so there are values in the +pod template that will always be overwritten by Spark. Therefore, users of this feature should note that specifying +the pod template file only lets Spark start with a template pod instead of an empty pod during the pod-building process. +For details, see the [full list](#pod-template-properties) of pod template values that will be overwritten by spark. + +Pod template files can also define multiple containers. In such cases, Spark will always assume that the first container in +the list will be the driver or executor container. --- End diff -- is it possible to use only extra containers and not Spark specific? Could we have a naming convention or a less error prone convention? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18447 Yea I'd probably reject this for now, until we see bigger needs for it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22241: [SPARK-25249][CORE][TEST]add a unit test for Open...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22241 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22245: [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kaf...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/22245 LGTM pending tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r213045752 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -130,6 +130,16 @@ private[spark] class Executor( private val urlClassLoader = createClassLoader() private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader) + // One thread will handle loading all of the plugins on this executor --- End diff -- I guess it does depend on what the intended use is here. If we have it in the same thread it has the issue that it could block the executor or take to long and things start timing out. It can have more direct impact on the executor code itself, where as a separate thread isolates it more. But like you say if its not here and we don't wait for it then we could have order issue if certain plugins have to be initialized before other things happen. I can see both arguments as well.So perhaps the api needs an init type function that can be called more inline with a timeout to prevent from taking to long and the main part of the plugin called in a separate thread? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22245: [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22245 **[Test build #95298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95298/testReport)** for PR 22245 at commit [`93c7bd9`](https://github.com/apache/spark/commit/93c7bd93f5dbec41a0fd4d6b5ef0bfe0bfdc235c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22245: [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22245 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22223: [SPARK-25233][Streaming] Give the user the option of spe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/3 **[Test build #4296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4296/testReport)** for PR 3 at commit [`85ece1c`](https://github.com/apache/spark/commit/85ece1c0866164a3f5a260b6e226b01c1fd1dd81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22245: [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22245 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22245: [SPARK-24882][FOLLOWUP] Fix flaky synchronization...
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/22245 [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kafka tests. ## What changes were proposed in this pull request? Fix flaky synchronization in Kafka tests - we need to use the scan config that was persisted rather than reconstructing it to identify the stream's current configuration. We caught most instances of this in the original PR, but this one slipped through. ## How was this patch tested? n/a You can merge this pull request into a Git repository by running: $ git pull https://github.com/jose-torres/spark fixflake Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22245.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22245 commit 93c7bd93f5dbec41a0fd4d6b5ef0bfe0bfdc235c Author: Jose Torres Date: 2018-08-27T17:03:17Z fix flake --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22241: [SPARK-25249][CORE][TEST]add a unit test for OpenHashMap
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22241 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r213043068 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +188,73 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( dataOut.writeInt(partitionIndex) // Python version of driver PythonRDD.writeUTF(pythonVer, dataOut) +// Init a ServerSocket to accept method calls from Python side. +val isBarrier = context.isInstanceOf[BarrierTaskContext] +if (isBarrier) { + serverSocket = Some(new ServerSocket(/* port */ 0, +/* backlog */ 1, +InetAddress.getByName("localhost"))) + // A call to accept() for ServerSocket shall block infinitely. + serverSocket.map(_.setSoTimeout(0)) + new Thread("accept-connections") { +setDaemon(true) + +override def run(): Unit = { + while (!serverSocket.get.isClosed()) { +var sock: Socket = null +try { + sock = serverSocket.get.accept() + // Wait for function call from python side. + sock.setSoTimeout(1) + val input = new DataInputStream(sock.getInputStream()) --- End diff -- (I'd also like to do some refactoring of the socket setup code in python, and that can go further if we do authenticaion first here) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22042 **[Test build #95297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95297/testReport)** for PR 22042 at commit [`7a02921`](https://github.com/apache/spark/commit/7a02921950cda865e3cd45f1d1635212c2f707c0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22042 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22042 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2579/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21330: [SPARK-22234] Support distinct window functions
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21330 cc @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21976: [SPARK-24909][core] Always unregister pending par...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21976#discussion_r213042176 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2474,19 +2478,21 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi runEvent(makeCompletionEvent( taskSets(3).tasks(0), Success, makeMapStatus("hostB", 2))) -// There should be no new attempt of stage submitted, -// because task(stageId=1, stageAttempt=1, partitionId=1) is still running in -// the current attempt (and hasn't completed successfully in any earlier attempts). -assert(taskSets.size === 4) +// At this point there should be no active task set for stageId=1 and we need +// to resubmit because the output from (stageId=1, stageAttemptId=0, partitionId=1) +// was ignored due to executor failure +assert(taskSets.size === 5) +assert(taskSets(4).stageId === 1 && taskSets(4).stageAttemptId === 2 + && taskSets(4).tasks.size === 1) -// Complete task(stageId=1, stageAttempt=1, partitionId=1) successfully. +// Complete task(stageId=1, stageAttempt=2, partitionId=1) successfully. runEvent(makeCompletionEvent( - taskSets(3).tasks(1), Success, makeMapStatus("hostB", 2))) + taskSets(4).tasks(0), Success, makeMapStatus("hostB", 2))) --- End diff -- IIUC the test case shall still pass without changing this line right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #95296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95296/testReport)** for PR 21546 at commit [`2fe46f8`](https://github.com/apache/spark/commit/2fe46f82dc38af972bc0974aca1fd846bcb483e5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r213041684 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand( val value = ExternalCatalogUtils.unescapePathName(ps(1)) if (resolver(columnName, partitionNames.head)) { scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value), -partitionNames.drop(1), threshold, resolver) +partitionNames.drop(1), threshold, resolver, listFilesInParallel = false) --- End diff -- @kiszk Thanks for the investigation! Please take a look at the root cause? If unable to figure it out, we need to revert it back to `.par`. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/22042 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2578/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22223: [SPARK-25233][Streaming] Give the user the option of spe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/3 **[Test build #4296 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4296/testReport)** for PR 3 at commit [`85ece1c`](https://github.com/apache/spark/commit/85ece1c0866164a3f5a260b6e226b01c1fd1dd81). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22197 Thanks. I got it. Definitely, it's irrelevant to this and an intentional regression due to that reverting. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22024: [SPARK-25034][CORE] Remove allocations in onBlock...
Github user vincent-grosbois commented on a diff in the pull request: https://github.com/apache/spark/pull/22024#discussion_r213037747 --- Diff: core/src/main/scala/org/apache/spark/network/BlockTransferService.scala --- @@ -101,15 +101,7 @@ abstract class BlockTransferService extends ShuffleClient with Closeable with Lo result.failure(exception) } override def onBlockFetchSuccess(blockId: String, data: ManagedBuffer): Unit = { - data match { -case f: FileSegmentManagedBuffer => - result.success(f) -case _ => - val ret = ByteBuffer.allocate(data.size.toInt) --- End diff -- I don't really understand the point of this initial commit tbh, was there ever a rationale for it ? (I can't find any comments). I made sure it works by testing it on our dataset (it will indeed crash if the ref count is not incremented). All 69f5d0a does is transforming a ManagedBuffer (abstract) into the concrete sub-type NioManagedBuffer. There is no real reason to copy the data, as long as you keep track of the reference count --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22236 **[Test build #95294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95294/testReport)** for PR 22236 at commit [`957a6a2`](https://github.com/apache/spark/commit/957a6a2cf0e05f01c2c2d602944b8da8cfb1b426). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21638 **[Test build #95295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95295/testReport)** for PR 21638 at commit [`5e46efb`](https://github.com/apache/spark/commit/5e46efb5f5ce86297c4aeb23bf934fd9942de3de). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21638 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2577/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21638 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2576/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22173 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95293/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22173 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #95293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95293/testReport)** for PR 22173 at commit [`6580ff1`](https://github.com/apache/spark/commit/6580ff1abec42f640c3090edfa32466f8f5b5212). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r213035238 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -91,6 +91,13 @@ private[spark] class Client( private val executorMemoryOverhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse( math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toLong, MEMORY_OVERHEAD_MIN)).toInt + private val isPython = sparkConf.get(IS_PYTHON_APP) --- End diff -- @holdenk, can you point me to that repo? I'd love to have a look at how you do mixed pipelines. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22173 **[Test build #95293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95293/testReport)** for PR 22173 at commit [`6580ff1`](https://github.com/apache/spark/commit/6580ff1abec42f640c3090edfa32466f8f5b5212). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22236 Yeah, I like that idea. Just compute it on initializing the model. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r213032992 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +188,73 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( dataOut.writeInt(partitionIndex) // Python version of driver PythonRDD.writeUTF(pythonVer, dataOut) +// Init a ServerSocket to accept method calls from Python side. +val isBarrier = context.isInstanceOf[BarrierTaskContext] +if (isBarrier) { + serverSocket = Some(new ServerSocket(/* port */ 0, +/* backlog */ 1, +InetAddress.getByName("localhost"))) + // A call to accept() for ServerSocket shall block infinitely. + serverSocket.map(_.setSoTimeout(0)) + new Thread("accept-connections") { +setDaemon(true) + +override def run(): Unit = { + while (!serverSocket.get.isClosed()) { +var sock: Socket = null +try { + sock = serverSocket.get.accept() + // Wait for function call from python side. + sock.setSoTimeout(1) + val input = new DataInputStream(sock.getInputStream()) --- End diff -- why is authentication the first thing which happens on this connection? I don't think anything bad can happen in this case, but it just makes it more likely we leave a security hole here later on. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22243: [MINOR] Avoid code duplication for nullable in Hi...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22243#discussion_r213029487 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -155,6 +155,8 @@ trait HigherOrderFunction extends Expression with ExpectsInputTypes { */ trait SimpleHigherOrderFunction extends HigherOrderFunction { + override def nullable: Boolean = argument.nullable --- End diff -- this works too IMO, if others agree I'll update with this suggestion, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213026874 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { --- End diff -- we shouldn't be adding methods here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22244 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22244 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95292/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22244 **[Test build #95292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95292/testReport)** for PR 22244 at commit [`f0e547c`](https://github.com/apache/spark/commit/f0e547c971f854b8a238baaebff8103036567223). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ArrowEvalPython(udfs: Seq[PythonUDF], output: Seq[Attribute], child: LogicalPlan)` * `case class BatchEvalPython(udfs: Seq[PythonUDF], output: Seq[Attribute], child: LogicalPlan)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22243: [MINOR] Avoid code duplication for nullable in Hi...
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/22243#discussion_r213022884 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -155,6 +155,8 @@ trait HigherOrderFunction extends Expression with ExpectsInputTypes { */ trait SimpleHigherOrderFunction extends HigherOrderFunction { + override def nullable: Boolean = argument.nullable --- End diff -- If we moved the definition of ```nullable``` straight to ```HigherOrderFunction``` as ```arguments.exists(_.nullable)```, we could also avoid the duplicities in ```ZipWith``` and ```MapZipWith```. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22244 **[Test build #95292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95292/testReport)** for PR 22244 at commit [`f0e547c`](https://github.com/apache/spark/commit/f0e547c971f854b8a238baaebff8103036567223). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22244 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22244 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2575/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22244 cc @icexelloss @HyukjinKwon @rdblue @icexelloss feel free to take this over and verify if it can pass the tests you added in #22104 , thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract pyth...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22244 [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF at the end of optimizer ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/12127 , we moved the `ExtractPythonUDFs` rule to the physical phase, while there was another option: do `ExtractPythonUDFs` at the end of optimizer. Currently we hit 2 issues when exacting python UDFs at physical phase: 1. it happens after data source v2 strategy, so data source v2 strategy needs to deal with python udfs carefully and adds project to produce unsafe row for python udf. See https://github.com/apache/spark/pull/22206 2. it happens after file source strategy, so we may keep Python UDF as data filter in `FileSourceScanExec` and fail the planner when try to extract it later. See https://github.com/apache/spark/pull/22104 This PR proposes to move `ExtractPythonUDFs` to the end of optimizer. ## How was this patch tested? TODO You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark python Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22244.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22244 commit f0e547c971f854b8a238baaebff8103036567223 Author: Wenchen Fan Date: 2018-08-27T15:40:18Z extract python UDF at the end of optimizer --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21976: [SPARK-24909][core] Always unregister pending partition ...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21976 just an fyi, the other jira is https://issues.apache.org/jira/browse/SPARK-25250, its related to a race with SPARK-23433 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/21990 What I wanted was to just call the Scala Methods, instead of having half the code and half in python, but we create the JVM in the SparkContext creation code so this ends up not being a good method I think. We could just translate the rest of GetOrCreate into Python but then every time there is a patch of the code in scala it will need a Python mod as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22184: [SPARK-25132][SQL][DOC] Add migration doc for cas...
Github user seancxmao commented on a diff in the pull request: https://github.com/apache/spark/pull/22184#discussion_r213020789 --- Diff: docs/sql-programming-guide.md --- @@ -1895,6 +1895,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see - Since Spark 2.4, File listing for compute statistics is done in parallel by default. This can be disabled by setting `spark.sql.parallelFileListingInStatsComputation.enabled` to `False`. - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and temporary files are not counted as data files when calculating table size during Statistics computation. +## Upgrading From Spark SQL 2.3.1 to 2.3.2 and above + + - In version 2.3.1 and earlier, when reading from a Parquet table, Spark always returns null for any column whose column names in Hive metastore schema and Parquet schema are in different letter cases, no matter whether `spark.sql.caseSensitive` is set to true or false. Since 2.3.2, when `spark.sql.caseSensitive` is set to false, Spark does case insensitive column name resolution between Hive metastore schema and Parquet schema, so even column names are in different letter cases, Spark returns corresponding column values. An exception is thrown if there is ambiguity, i.e. more than one Parquet column is matched. --- End diff -- As a followup to cloud-fan's point, I did a deep dive into read path of parquet hive serde table. Following is a rough invocation chain: ``` org.apache.spark.sql.hive.execution.HiveTableScanExec org.apache.spark.sql.hive.HadoopTableReader (extendes org.apache.spark.sql.hive.TableReader) org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat (extends org.apache.hadoop.mapred.FileInputFormat) org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper (extends org.apache.hadoop.mapred.RecordReader) parquet.hadoop.ParquetRecordReader parquet.hadoop.InternalParquetRecordReader org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport (extends parquet.hadoop.api.ReadSupport) ``` Finally, `DataWritableReadSupport#getFieldTypeIgnoreCase` is invoked. https://github.com/JoshRosen/hive/blob/release-1.2.1-spark2/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java#L79-L95 This is why parquet hive serde table always do case-insensitive field resolution. However, this is a class inside `org.spark-project.hive:hive-exec:1.2.1.spark2`. I also found the related Hive JIRA ticket: [HIVE-7554: Parquet Hive should resolve column names in case insensitive manner](https://issues.apache.org/jira/browse/HIVE-7554) BTW: * org.apache.hadoop.hive.ql = org.spark-project.hive:hive-exec:1.2.1.spark2 * parquet.hadoop = com.twitter:parquet-hadoop-bundle:1.6.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22241: [SPARK-25249][CORE][TEST]add a unit test for OpenHashMap
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22241 @kiszk I guess it's because in this case the underlying value type is a primitive like int or long, so null can't be returned? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22162 sure, no worries @kiszk, I can take it if needed. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/22112#discussion_r213009399 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -33,6 +33,9 @@ import org.apache.spark.util.random.SamplingUtils /** * An object that defines how the elements in a key-value pair RDD are partitioned by key. * Maps each key to a partition ID, from 0 to `numPartitions - 1`. + * + * Note that, partitioner must be idempotent, i.e. it must return the same partition id given the --- End diff -- I think you mean deterministic, not idempotent (which would mean that `partition(key) == partition(partition(key))`) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/22112#discussion_r213017779 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1865,6 +1871,62 @@ abstract class RDD[T: ClassTag]( // RDD chain. @transient protected lazy val isBarrier_ : Boolean = dependencies.filter(!_.isInstanceOf[ShuffleDependency[_, _, _]]).exists(_.rdd.isBarrier()) + + /** + * Returns the random level of this RDD's output. Please refer to [[RandomLevel]] for the + * definition. + * + * By default, an reliably checkpointed RDD, or RDD without parents(root RDD) is IDEMPOTENT. For + * RDDs with parents, we will generate a random level candidate per parent according to the + * dependency. The random level of the current RDD is the random level candidate that is random + * most. Please override [[getOutputRandomLevel]] to provide custom logic of calculating output + * random level. + */ + // TODO: make it public so users can set random level to their custom RDDs. + // TODO: this can be per-partition. e.g. UnionRDD can have different random level for different + // partitions. + private[spark] final lazy val outputRandomLevel: RandomLevel.Value = { +if (checkpointData.exists(_.isInstanceOf[ReliableRDDCheckpointData[_]])) { --- End diff -- hmm, so I took another look at the checkpoint code, and it seems to me like it doesn't checkpointing will actually help. IIUC, checkpointing doesn't actually take place until the *job* finishes, not just the stage: https://github.com/apache/spark/blob/6193a202aab0271b4532ee4b740318290f2c44a1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2061-L2063 So when you have a failure in the middle of a job with a long pipeline, when you go back to an earlier stage, you're not actually going back to checkpointed data. But maybe I'm reading this wrong? doesn't seem like what checkpointing _should_ be doing, actually ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/22112#discussion_r213010846 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1918,3 +1980,19 @@ object RDD { new DoubleRDDFunctions(rdd.map(x => num.toDouble(x))) } } + +/** + * The random level of RDD's output (i.e. what `RDD#compute` returns), which indicates how the + * output will diff when Spark reruns the tasks for the RDD. There are 3 random levels, ordered + * by the randomness from low to high: + * 1. IDEMPOTENT: The RDD output is always same (including order) when rerun. --- End diff -- here too, idempotent is the wrong word for this ... deteminstic? partition-ordered? (I guess "ordered" could make it seem like the entire data is ordered ...) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22024: [SPARK-25034][CORE] Remove allocations in onBlockFetchSu...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22024 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22024: [SPARK-25034][CORE] Remove allocations in onBlock...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22024#discussion_r213015113 --- Diff: core/src/main/scala/org/apache/spark/network/BlockTransferService.scala --- @@ -101,15 +101,7 @@ abstract class BlockTransferService extends ShuffleClient with Closeable with Lo result.failure(exception) } override def onBlockFetchSuccess(blockId: String, data: ManagedBuffer): Unit = { - data match { -case f: FileSegmentManagedBuffer => - result.success(f) -case _ => - val ret = ByteBuffer.allocate(data.size.toInt) --- End diff -- The copy behavior was introduced by : https://github.com/apache/spark/pull/2330/commits/69f5d0a2434396abbbd98886e047bc08a9e65565. How can you make sure this can be replaced by increasing the reference count? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22024: [SPARK-25034][CORE] Remove allocations in onBlock...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22024#discussion_r213015245 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -160,7 +160,13 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) releaseLock(pieceId) case None => bm.getRemoteBytes(pieceId) match { -case Some(b) => +case Some(splitB) => + + // Checksum computation and further computations require the data + // from the ChunkedByteBuffer to be merged, so we we merge it now. --- End diff -- nit of the comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22162 Would someone please take it? I have less bandwidth next two days since I will be in a training session at my office. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20637#discussion_r213013507 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -223,8 +223,9 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa } } else { val lit = InternalRow(expected, expected) + val dtAsNullable = expression.dataType.asNullable --- End diff -- @ueshin @cloud-fan Thank you for good summary. I think that this does not reduce test coverage. This ` dtAsNullable = expression.dataType.asNullable` is used only for generating `expected`. This `asNullable` does not change `dataType` of `expression`. Thus, this does not change our optimization assumption. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213010807 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt + } + + /** + * Returns the default max characters per column to show before truncation when + * the show function is called with truncate. + * @since 2.3.0 --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213010706 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt + } + + /** + * Returns the default max characters per column to show before truncation when + * the show function is called with truncate. + * @since 2.3.0 + */ + private def maxCharactersPerColumnToShow(): Int = { --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213010879 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 --- End diff -- not needed as this is private, moreover, it'd be since 2.4.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213010672 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { --- End diff -- I'd remove the `()` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r213010408 --- Diff: python/pyspark/sql/session.py --- @@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None): .sparkContext().isStopped(): jsparkSession = self._jvm.SparkSession.getDefaultSession().get() else: -jsparkSession = self._jvm.SparkSession(self._jsc.sc()) +jsparkSession = self._jvm.SparkSession.builder() \ --- End diff -- @RussellSpitzer, have you maybe had a chance to take a look and see if we can deduplicate some logics comparing to Scala's `getOrCreate`? I am suggesting this since now it looks the code path duplicates some logics there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22162 sure @HyukjinKwon, thanks for pinging me anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22223: [SPARK-25233][Streaming] Give the user the option of spe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/3 **[Test build #4295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4295/testReport)** for PR 3 at commit [`2b0b1ce`](https://github.com/apache/spark/commit/2b0b1ce3876e2f55807156a98f75068280e03054). * This patch **fails MiMa tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r213009058 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand( val value = ExternalCatalogUtils.unescapePathName(ps(1)) if (resolver(columnName, partitionNames.head)) { scanPartitions(spark, fs, filter, st.getPath, spec ++ Map(partitionNames.head -> value), -partitionNames.drop(1), threshold, resolver) +partitionNames.drop(1), threshold, resolver, listFilesInParallel = false) --- End diff -- Thank you attaching the stack trace. I have just looked at it. It looks strange to me. Every thread is `waiting for`. No blocker is there, only one `locked` exists. In typical case, a deadlock occurs due to existence of blocker as attached stack trace in #1 I will investigate it furthermore tomorrow if we need to use this implementation instead of reverting it to the original implementation to use Scala parallel collection. ``` ... - parking to wait for <0x000793c0d610> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:206) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:222) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) at org.apache.spark.util.ThreadUtils$.parmap(ThreadUtils.scala:317) at org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand.scanPartitions(ddl.scala:690) at org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand.run(ddl.scala:626) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) - locked <0x000793b04e88> (a org.apache.spark.sql.execution.command.ExecutedCommandExec) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22226: [SPARK-24391][SQL] Support arrays of any types by to_jso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/6 **[Test build #95291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95291/testReport)** for PR 6 at commit [`906a301`](https://github.com/apache/spark/commit/906a3013e97f8e1d1f8f7e0e335f7404de47b582). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22162 @viirya, @kiszk, @mgaido91 and @maropu, would you be interested in taking this over if this gets inactive for few more days? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22162 ping @AndrewKL --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org