[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-14 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20894 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20894 **[Test build #89365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89365/testReport)** for PR 20894 at commit

[GitHub] spark pull request #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-04-14 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20629#discussion_r181547119 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala --- @@ -64,12 +65,12 @@ class ClusteringEvaluator @Since("2.3.0")

[GitHub] spark pull request #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-04-14 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20629#discussion_r181547107 --- Diff: python/pyspark/ml/clustering.py --- @@ -322,7 +323,11 @@ def computeCost(self, dataset): """ Return the K-means cost

[GitHub] spark issue #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost

2018-04-14 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20629 @holdenk I am not sure about requiring or not cluster centers for this metric. On one side, since the `ClusteringEvaluator` should be a general interface for all clustering models and some of them

[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes

2018-04-14 Thread stoader
Github user stoader commented on the issue: https://github.com/apache/spark/pull/21067 @mccheah > But whether or not the driver should be relaunchable should be determined by the application submitter, and not necessarily done all the time. Can we make this behavior

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-04-14 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r181543985 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -304,45 +304,14 @@ case class LoadDataCommand(

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89364/testReport)** for PR 20937 at commit

[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21056 **[Test build #89366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89366/testReport)** for PR 21056 at commit

[GitHub] spark pull request #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-14 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/21072 [SPARK-23973][SQL] Remove consecutive Sorts ## What changes were proposed in this pull request? In SPARK-23375 we introduced the ability of removing `Sort` operation during query

[GitHub] spark issue #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21072 **[Test build #89367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89367/testReport)** for PR 21072 at commit

[GitHub] spark issue #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-14 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21072 cc @cloud-fan @henryr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20894 **[Test build #89365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89365/testReport)** for PR 20894 at commit

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20894 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89365/ Test FAILed. ---

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20894 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21072 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21072 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2329/

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89364/testReport)** for PR 20937 at commit

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89364/ Test PASSed. ---

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-04-14 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20894 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 Merged to branch-2.3. Thanks for reviewing this @BryanCutler. --- - To unsubscribe, e-mail:

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r181552961 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -304,45 +304,14 @@ case class LoadDataCommand(

[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21057 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21057#discussion_r181553378 --- Diff: python/pyspark/streaming/kafka.py --- @@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, fromOffsets=None,

[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21056 **[Test build #89366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89366/testReport)** for PR 21056 at commit

[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21004 (let's avoid to describe the PR description just saying improvement next time) --- - To unsubscribe, e-mail:

[GitHub] spark issue #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21070 @rdblue, BTW mind fixing the title to `[SPARK-23972][...] ...`? It's actually written in the guide. --- - To unsubscribe,

[GitHub] spark pull request #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes coll...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/21060 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21057 BTW, mind fixing the PR title to `[MINOR][PYTHON] ... ` and make the title more descriptive? not a big deal but good to match it with other PRs. ---

[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21057 **[Test build #89368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89368/testReport)** for PR 21057 at commit

[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21057 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21057 **[Test build #89368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89368/testReport)** for PR 21057 at commit

[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21057 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89368/ Test FAILed. ---

[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21056 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21056: [SPARK-23849][SQL] Tests for samplingRatio of json datas...

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89366/ Test PASSed. ---

[GitHub] spark issue #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21072 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89367/ Test PASSed. ---

[GitHub] spark issue #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21072 **[Test build #89367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89367/testReport)** for PR 21072 at commit

[GitHub] spark issue #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21072 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21072: [SPARK-23973][SQL] Remove consecutive Sorts

2018-04-14 Thread henryr
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21072#discussion_r181563918 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -736,12 +736,15 @@ object EliminateSorts extends

[GitHub] spark pull request #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpa...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21007#discussion_r181569225 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -3189,10 +3189,10 @@ class Dataset[T] private[sql]( private[sql]

[GitHub] spark pull request #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181563672 --- Diff: pom.xml --- @@ -129,7 +129,7 @@ 1.2.1 10.12.1.1 -1.8.2 +1.10.0 --- End diff --

[GitHub] spark pull request #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181564129 --- Diff: pom.xml --- @@ -129,7 +129,7 @@ 1.2.1 10.12.1.1 -1.8.2 +1.10.0 --- End diff --

[GitHub] spark pull request #18378: [SPARK-21163][SQL] DataFrame.toPandas should resp...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18378#discussion_r181567408 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1750,6 +1761,24 @@ def _to_scala_map(sc, jm): return sc._jvm.PythonUtils.toScalaMap(jm)

[GitHub] spark pull request #18378: [SPARK-21163][SQL] DataFrame.toPandas should resp...

2018-04-14 Thread edlee123
Github user edlee123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18378#discussion_r181565606 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1750,6 +1761,24 @@ def _to_scala_map(sc, jm): return sc._jvm.PythonUtils.toScalaMap(jm)

[GitHub] spark pull request #18378: [SPARK-21163][SQL] DataFrame.toPandas should resp...

2018-04-14 Thread edlee123
Github user edlee123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18378#discussion_r181567770 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1750,6 +1761,24 @@ def _to_scala_map(sc, jm): return sc._jvm.PythonUtils.toScalaMap(jm)

[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #89369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89369/testReport)** for PR 21073 at commit

[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21057#discussion_r181568263 --- Diff: python/pyspark/streaming/listener.py --- @@ -22,6 +22,10 @@ class StreamingListener(object): def __init__(self): pass

[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21057 I think you may create a minor JIRA ticket for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21007 @HyukjinKwon @BryanCutler @viirya @felixcheung The first sentence of this PR really scares me. After reading the PR description. Since the PR description will be part of our change log. Please

[GitHub] spark pull request #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-14 Thread bersprockets
GitHub user bersprockets opened a pull request: https://github.com/apache/spark/pull/21073 [SPARK-23936][SQL][WIP] Implement map_concat ## What changes were proposed in this pull request? Implement map_concat high order function. This is a work in progress.

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 Since this is not a bug fix, I plan to revert this PR. WDYT? @HyukjinKwon @BryanCutler --- - To unsubscribe, e-mail:

[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21073 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21057 +1 for ^. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 I guess the behaviour changes here is that a custom query execution listener now can recognise the action `collect` in PySpark which other APIs have detected. Mind explaining how it breaks

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 I agree that It's better to avoid a behaviour change but this one is a clearly a bug and the fix is straightforward. I am puzzled why this specifically prompted you. I wouldn't revert if

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 I am a bit puzzled because `QueryExecutionListener` should call the callback for actions and `collect` triggers it in Scala and R but it doesn't in PySpark specifically. It sounds a bug and

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 This is just the basic backport rule we follow for each PR. We should not make an exception for this PR. --- - To

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 How about we formally document that in the guide? I have been always putting more importance on practice and I personally think we are fine to make a backport if it's a bug and the fix

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 hm I would say it's a bug since the action is not detected which is supposed to call the callback. The test is a bit complicated but the fix is relatively straightforward. ---

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 This will introduce the behavior change and it is not a regression. The changes we made in this PR could break the external app. We should not do it in the maintenance release. ---

[GitHub] spark issue #20451: [SPARK-23146][WIP] Support client mode for Kubernetes cl...

2018-04-14 Thread echarles
Github user echarles commented on the issue: https://github.com/apache/spark/pull/20451 Now that #20910 has been merged, I will update this PR to take account the refactoring. @inetfuture Once these changes are pushed, there is the review process which needs to occur, so difficult

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 If this can be treated as a bug to backport, we have many behavior change PRs that can be backported. We are building the system software. We have to be more principled. ---

[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21007 What's wrong in the description and PR title, and what to document? Do you mean the first sentence `This PR proposes to add collect to a query executor as an action.` is wrong because this

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 Users apps should not be blamed in this case. If they want this change, they should upgrade to the newer release. Basically, we should not introduce any external behavior change in the

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 Yup, that should reduce some overhead like this. I would like to listen what you guys think cc @srowenn, @vanzin, @felixcheung, @holdenk too. ---

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 I do think we should clearly document the rule what we can backport. I do not think we should make an exception for this PR. cc @rxin @marmbrus @yhuai @cloud-fan @ueshin ---