[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-15 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r181575614 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -151,6 +151,9 @@ abstract class

[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #89369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89369/testReport)** for PR 21073 at commit

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19868 **[Test build #89370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89370/testReport)** for PR 19868 at commit

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19868 @cloud-fan @jiangxb1987 I updated and add a config `spark.files.ignoreMissingFiles`. It works for HadoopRDD and NewHadoopRDD in two cases: 1. "file not found" when `getPartitions` 2.

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19868 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19868 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2331/

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181571713 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -197,17 +200,24 @@ class HadoopRDD[K, V]( val jobConf = getJobConf()

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19868 **[Test build #89371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89371/testReport)** for PR 19868 at commit

[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21073 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89369/ Test FAILed. ---

[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21073 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #18378: [SPARK-21163][SQL] DataFrame.toPandas should resp...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18378#discussion_r181573285 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1750,6 +1761,24 @@ def _to_scala_map(sc, jm): return sc._jvm.PythonUtils.toScalaMap(jm)

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20937 @MaxGekk, BTW is https://github.com/apache/spark/pull/20937#discussion_r180050283 true? I think it's a problem if it decreases the performance 20% for wrapping `ByteArrayInputStream` cost.

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19868 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19868 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2330/

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19868 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21068 **[Test build #89373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89373/testReport)** for PR 21068 at commit

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181571746 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -124,17 +126,25 @@ class NewHadoopRDD[K, V](

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19868 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19868 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89370/ Test FAILed. ---

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19868 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19868 **[Test build #89371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89371/testReport)** for PR 19868 at commit

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19868 **[Test build #89370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89370/testReport)** for PR 19868 at commit

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19868 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89371/ Test FAILed. ---

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19868 **[Test build #89372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89372/testReport)** for PR 19868 at commit

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20937 @HyukjinKwon Actually performance degrades because of `InputStreamReader`. Cost of `ByteArrayInputStream` is relatively very small. As you can see in the screenshot below `InputStreamReader`

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21060 This certainly looks like a bug fix. I don't know this area well, but I don't see an argument here that the current behavior is correct. Right? When we say we don't back-port behavior

[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

2018-04-15 Thread jiangxb1987
GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/21074 [SPARK-21811][SQL] Fix the inconsistency behavior when finding the widest common type ## What changes were proposed in this pull request? Currently we find the wider common type by

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 > This case specifically collect in PySpark doesn't work alone whereas all other actions like foreach, show and other cases in other languages works in all other APIs. Also, that's what a query

[GitHub] spark issue #21034: [SPARK-23926][SQL] Extending reverse function to support...

2018-04-15 Thread mn-mikke
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/21034 Any other comments? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89375/testReport)** for PR 20937 at commit

[GitHub] spark issue #21036: [SPARK-23958][CORE] HadoopRdd filters empty files to avo...

2018-04-15 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21036 @guoxiaolongzte Have you tried the config `spark.hadoopRDD.ignoreEmptySplits` ? --- - To unsubscribe, e-mail:

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 `withCallback` was added in Spark 1.6 release https://issues.apache.org/jira/browse/SPARK-11068 Since then, my understanding is we never clearly define which should be part of `withCallback`.

[GitHub] spark pull request #21070: SPARK-23972: Update Parquet to 1.10.0.

2018-04-15 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r181589911 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java --- @@ -63,115 +58,139 @@ public final

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89375/ Test PASSed. ---

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20938: [SPARK-23821][SQL] Collection function: flatten

2018-04-15 Thread mn-mikke
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/20938#discussion_r181593152 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +289,160 @@ case class

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89374/ Test FAILed. ---

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89374/testReport)** for PR 20937 at commit

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89374/testReport)** for PR 20937 at commit

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89377/testReport)** for PR 20937 at commit

[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-04-15 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r181590066 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +288,80 @@ case class

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19868 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89372/ Test PASSed. ---

[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21074 **[Test build #89376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89376/testReport)** for PR 21074 at commit

[GitHub] spark issue #18378: [SPARK-21163][SQL] DataFrame.toPandas should respect the...

2018-04-15 Thread edlee123
Github user edlee123 commented on the issue: https://github.com/apache/spark/pull/18378 Ok I see, I can see part of the rationale is performance (from discussion of astype above) and consistency with pyarrow https://arrow.apache.org/docs/python/pandas.html I guess without

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 > We need to be very careful when backporting the PR with the behavior changes, especially when this is neither a critical issue nor a regression. Thus, I do not think we should backport this

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21068 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89373/ Test PASSed. ---

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21068 **[Test build #89373 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89373/testReport)** for PR 21068 at commit

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21068 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89379/testReport)** for PR 20937 at commit

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19868 **[Test build #89372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89372/testReport)** for PR 19868 at commit

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19868 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21074 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 > withCallback was added in Spark 1.6 release https://issues.apache.org/jira/browse/SPARK-11068 Since then, my understanding is we never clearly define which should be part of withCallback.

[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21074 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2332/

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 > The callback works for collect in R and Scala but Python doesn't. I think we should at least match the behaviour. I wonder why it's hard to say a bug when collect is detected in some APIs but

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 > The behavior consistency among Python/Scala/R/JAVA does not mean a bug, right? This case specifically `collect` in PySpark doesn't work alone whereas all other actions like

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 This is not an new feature addition .. this fixes an exiting functionality to work as expected and consistently .. Sure, that'd be great. Will join in the discussion. ---

[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #89378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89378/testReport)** for PR 21073 at commit

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89375/testReport)** for PR 20937 at commit

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 Fixing API inconsistency should not be treated as a bug fix. Please give me a few days. I need to summarize the Spark 2.3 release and list all the PRs that were backported to the

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #89380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89380/testReport)** for PR 20611 at commit

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 cc @rdblue and @steveloughran too who I guess should be interested in setting up a backporting policy. --- - To

[GitHub] spark issue #18378: [SPARK-21163][SQL] DataFrame.toPandas should respect the...

2018-04-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18378 It's pretty natural to convert integer type to int32. Although Spark tries its best to avoid behavior changes, it's allowed to fix some wrong behaviors in new releases, and I believe it's well

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181607863 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -279,6 +293,10 @@ class HadoopRDD[K, V]( case e: IOException if

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89381 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89381/testReport)** for PR 20937 at commit

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89381/ Test PASSed. ---

[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21074 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21074 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89376/ Test PASSed. ---

[GitHub] spark pull request #21075: [SPARK-23988][MESOS] Improve handling of appResou...

2018-04-15 Thread pmackles
GitHub user pmackles opened a pull request: https://github.com/apache/spark/pull/21075 [SPARK-23988][MESOS] Improve handling of appResource in mesos dispatcher when using Docker Improve/fix handling of appResource for mesos dispatcher when using docker Tested with new unit

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21060 Like what I said above, we need to be very careful when backporting the PR with the behavior changes, especially when this is **neither a critical issue nor a regression**. Even if this is a bug

[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21073 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21073 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89378/ Test PASSed. ---

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89379 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89379/testReport)** for PR 20937 at commit

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89379/ Test PASSed. ---

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #89380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89380/testReport)** for PR 20611 at commit

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 This is not just about just inconsistency but a bug. The previous behaivour doesn't make sense. Sure, no need to rush. ---

[GitHub] spark issue #21075: [SPARK-23988][MESOS] Improve handling of appResource in ...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21075 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21060 I am not saying we shouldn't be careful. I am trying to be careful when I backport. So, your reasons are: - any behaviour changes shouldn't be backported and it's the basic backport

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181607797 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -260,6 +270,10 @@ class HadoopRDD[K, V]( logWarning(s"Skipped

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181607746 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -197,17 +200,24 @@ class HadoopRDD[K, V]( val jobConf = getJobConf()

[GitHub] spark pull request #19526: [SPARK-22014][SQL] removed TypeCheckFailure: slid...

2018-04-15 Thread SimonUzL
Github user SimonUzL closed the pull request at: https://github.com/apache/spark/pull/19526 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21074 **[Test build #89376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89376/testReport)** for PR 21074 at commit

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89377/ Test PASSed. ---

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89377/testReport)** for PR 20937 at commit

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #89378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89378/testReport)** for PR 21073 at commit

[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89381/testReport)** for PR 20937 at commit

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20611 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89380/ Test PASSed. ---

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20611 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21075: [SPARK-23988][MESOS] Improve handling of appResource in ...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21075 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181607994 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -70,6 +71,45 @@ class QueryPartitionSuite extends QueryTest

[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-04-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20535 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #18378: [SPARK-21163][SQL] DataFrame.toPandas should respect the...

2018-04-15 Thread edlee123
Github user edlee123 commented on the issue: https://github.com/apache/spark/pull/18378 I see the rationale now, thank you everyone --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21053 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21047: [SPARK-23956][YARN] Use effective RPC port in AM registr...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21047 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21047: [SPARK-23956][YARN] Use effective RPC port in AM registr...

2018-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21047 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89384/ Test PASSed. ---

[GitHub] spark issue #21047: [SPARK-23956][YARN] Use effective RPC port in AM registr...

2018-04-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21047 **[Test build #89384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89384/testReport)** for PR 21047 at commit

  1   2   >