[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...

2018-08-20 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211485248 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -626,6 +626,7 @@ object DataSource extends

[GitHub] spark issue #22158: [SPARK-25161][Core] Fix several bugs in failure handling...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22158 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22158: [SPARK-25161][Core] Fix several bugs in failure handling...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22158 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

2018-08-20 Thread seancxmao
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22148 Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22158: [SPARK-25161][Core] Fix several bugs in failure handling...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22158 **[Test build #94998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94998/testReport)** for PR 22158 at commit

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22158: [SPARK-25161][Core] Fix several bugs in failure handling...

2018-08-20 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/22158 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22153 **[Test build #94997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94997/testReport)** for PR 22153 at commit

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22112 > always return the same result with same order when rerun.. maybe the word "idempotent" is not that accurate. Spark doesn't really care about the order, so the requirement is, for the

[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-20 Thread cclauss
Github user cclauss commented on the issue: https://github.com/apache/spark/pull/20838 Thanks massively for this. I doubt that I _ever_ would have gotten to that on my own. This is a test so my proposal would be that _you create a separate PR_ so that we are all assured that it

[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211482746 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -626,6 +626,7 @@ object DataSource extends

[GitHub] spark issue #22133: [SPARK-25129][SQL]Make the mapping of com.databricks.spa...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22133 **[Test build #94996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94996/testReport)** for PR 22133 at commit

[GitHub] spark issue #22133: [SPARK-25129][SQL]Make the mapping of com.databricks.spa...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22133 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22133: [SPARK-25129][SQL]Make the mapping of com.databricks.spa...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22133 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22165 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211482547 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -593,7 +592,6 @@ object DataSource extends

[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22165 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...

2018-08-20 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211482461 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -626,6 +626,7 @@ object DataSource extends

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94995/ Test FAILed. ---

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22153 **[Test build #94995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94995/testReport)** for PR 22153 at commit

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...

2018-08-20 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211482149 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -593,7 +592,6 @@ object DataSource extends

[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22165 **[Test build #94994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94994/testReport)** for PR 22165 at commit

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22153 **[Test build #94995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94995/testReport)** for PR 22153 at commit

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...

2018-08-20 Thread xuanyuanking
GitHub user xuanyuanking opened a pull request: https://github.com/apache/spark/pull/22165 [SPARK-25017][Core] Add test suite for BarrierCoordinator and ContextBarrierState ## What changes were proposed in this pull request? Currently `ContextBarrierState` and

[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-20 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20838 Hi @cclauss , sorry for the frustration. I looked into the test, and it was kind of a pain to get it working right - which is probably why it wasn't done in the first place ;) Here

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94986/ Test FAILed. ---

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22153 **[Test build #94986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94986/testReport)** for PR 22153 at commit

[GitHub] spark issue #22161: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-08-20 Thread dilipbiswal
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22161 @HyukjinKwon Done. [SPARK-25167](https://issues.apache.org/jira/browse/SPARK-25167) --- - To unsubscribe, e-mail:

[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22164 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94991/ Test PASSed. ---

[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22164 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22164 **[Test build #94991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94991/testReport)** for PR 22164 at commit

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22163 **[Test build #94993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94993/testReport)** for PR 22163 at commit

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22154 **[Test build #94992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94992/testReport)** for PR 22154 at commit

[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-20 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22112 If "always return the same result with same order when rerun." is the definition of "idempotent", then yes, MLlib RDD closures always returns the same result if the input doesn't change. We use

[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...

2018-08-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22154 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94985/ Test FAILed. ---

[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22154 **[Test build #94985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94985/testReport)** for PR 22154 at commit

[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22164 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22154 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94983/ Test PASSed. ---

[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22164 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22154: [SPARK-23711][SPARK-25140][SQL] Catch correct exceptions...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22154 **[Test build #94983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94983/testReport)** for PR 22154 at commit

[GitHub] spark issue #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA ...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22164 **[Test build #94991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94991/testReport)** for PR 22164 at commit

[GitHub] spark pull request #22164: [SPARK-23679][YARN] Fix AmIpFilter cannot work in...

2018-08-20 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/22164 [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA scenario ## What changes were proposed in this pull request? YARN `AmIpFilter` adds a new parameter "RM_HA_URLS" to support RM HA,

[GitHub] spark issue #22156: [SPARK-25144][SQL][TEST][BRANCH-2.2] Free aggregate map ...

2018-08-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22156 Thank you, @HyukjinKwon . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark pull request #22156: [SPARK-25144][SQL][TEST][BRANCH-2.2] Free aggrega...

2018-08-20 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/22156 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22155: [SPARK-25144][SQL][TEST] Free aggregate map when task en...

2018-08-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22155 Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread 10110346
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/22163 The current buffer is `writeBuffer`, I mean copying `writeBuffer` to 'diskWriteBuffer' or other buffer --- - To unsubscribe,

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread 10110346
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/22163 The current buffer is `writeBuffer`. I mean copying `writeBuffer` to `diskWriteBuffer` or other buffer --- - To unsubscribe,

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21859 If this optimization is done more generally, will the implicitly cached data cause memory pressure on driver, as seems we don't have way to release them? ---

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-20 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20637#discussion_r211468226 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -43,25 +45,30 @@ object

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22163 What you mean `only one record is written to a buffer each time`? Isn't it controlled by `diskWriteBufferSize` to write such size of data each time? ---

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94984/ Test FAILed. ---

[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20345 **[Test build #94984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94984/testReport)** for PR 20345 at commit

[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22140 cc @BryanCutler as well since we discussed an issue about this code path before. --- - To unsubscribe, e-mail:

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 'The ShuffleWriter should treat RangePartitioner specially and consume the sampled data in RangePartitioner instead of the input iterator.' This idea is good, maybe we can cache both the K and V

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22163 **[Test build #94989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94989/testReport)** for PR 22163 at commit

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94989/ Test FAILed. ---

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21859 **[Test build #94990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94990/testReport)** for PR 21859 at commit

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 I read the source code again. The RangePartitioner[K, V] in ShuffleExchangeExec is an instance of RangePartitioner[InternalRow, Null]. RangePartitioner only sample K for getting the

[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-20 Thread 10110346
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/22065 This is end-to-end performance improvement, although our data is very small. --- - To unsubscribe, e-mail:

[GitHub] spark issue #22157: [SPARK-25126] Avoid creating Reader for all orc files

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22157 > Do we have a similar issue for Parquet? Looks not since we explicitly pick up one file before reading in schema inference:

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22163 **[Test build #94989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94989/testReport)** for PR 22163 at commit

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22112 So there are 2 options: 1. ask the RDD closure to be idempotent. I'm not sure if it's OK for MLlib, cc @mengxr @WeichenXu123 @yanboliang 2. ask the output committer to be able

[GitHub] spark pull request #22163: [SPARK-25166][CORE]Reduce the number of write ope...

2018-08-20 Thread 10110346
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/22163 [SPARK-25166][CORE]Reduce the number of write operations for shuffle write. ## What changes were proposed in this pull request? Currently, only one record is written to a buffer each

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22112 **[Test build #94988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94988/testReport)** for PR 22112 at commit

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-20 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22138 @koeninger Yeah I see what you're saying, then IMHO isolating consumers with query sounds better than others. Adding next offset to the cache key would make consumer moving bucket in cache

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22161: [SPARKR][TEST][MINOR] Minor fixes for R sql tests

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22161 Eh, @dilipbiswal, actually can we file a JIRA? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94982/ Test PASSed. ---

[GitHub] spark pull request #22148: [SPARK-25132][SQL] Case-insensitive field resolut...

2018-08-20 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22148 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21306 **[Test build #94982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94982/testReport)** for PR 21306 at commit

[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22148 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21320 **[Test build #94987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94987/testReport)** for PR 21320 at commit

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21320 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21320 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21320 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #22123: [SPARK-25134][SQL] Csv column pruning with checki...

2018-08-20 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22123: [SPARK-25134][SQL] Csv column pruning with checking of h...

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22123 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94979/ Test PASSed. ---

[GitHub] spark issue #22133: [SPARK-25129][SQL]Make the mapping of com.databricks.spa...

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22133 Seems fine otherwise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21669 **[Test build #94979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94979/testReport)** for PR 21669 at commit

[GitHub] spark pull request #22133: [SPARK-25129][SQL]Make the mapping of com.databri...

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22133#discussion_r211460921 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -626,6 +626,7 @@ object DataSource extends

  1   2   3   4   5   6   >