[GitHub] [spark] viirya commented on a change in pull request #29385: [SPARK-32191][PySpark][DOC] Migration Guide for PySpark docs

2020-08-07 Thread GitBox
viirya commented on a change in pull request #29385: URL: https://github.com/apache/spark/pull/29385#discussion_r467142492 ## File path: python/docs/source/migration_guide/index.rst ## @@ -20,3 +20,14 @@ Migration Guide === +Migration Guide: PySpark (Python on

[GitHub] [spark] viirya opened a new pull request #29385: [SPARK-32191] Migration Guide for PySpark docs

2020-08-07 Thread GitBox
viirya opened a new pull request #29385: URL: https://github.com/apache/spark/pull/29385 ### What changes were proposed in this pull request? This proposes to port old PySpark migration guide to new PySpark docs. ### Why are the changes needed? Better

[GitHub] [spark] viirya closed pull request #29325: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre

2020-08-07 Thread GitBox
viirya closed pull request #29325: URL: https://github.com/apache/spark/pull/29325 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] viirya commented on pull request #29325: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre

2020-08-07 Thread GitBox
viirya commented on pull request #29325: URL: https://github.com/apache/spark/pull/29325#issuecomment-670607835 Thanks @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] tinhto-000 commented on a change in pull request #29383: [SPARK-31703][SQL] Parquet RLE float/double are read incorrectly on big endian platforms

2020-08-07 Thread GitBox
tinhto-000 commented on a change in pull request #29383: URL: https://github.com/apache/spark/pull/29383#discussion_r467154975 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ## @@ -425,6 +425,19 @@ public void

[GitHub] [spark] AmplabJenkins commented on pull request #29385: [SPARK-32191][PySpark][DOC] Migration Guide for PySpark docs

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29385: URL: https://github.com/apache/spark/pull/29385#issuecomment-670612996 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #29385: [SPARK-32191][PySpark][DOC] Migration Guide for PySpark docs

2020-08-07 Thread GitBox
SparkQA removed a comment on pull request #29385: URL: https://github.com/apache/spark/pull/29385#issuecomment-670600497 **[Test build #127209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127209/testReport)** for PR 29385 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29385: [SPARK-32191][PySpark][DOC] Migration Guide for PySpark docs

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29385: URL: https://github.com/apache/spark/pull/29385#issuecomment-670612996 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-08-07 Thread GitBox
SparkQA commented on pull request #28846: URL: https://github.com/apache/spark/pull/28846#issuecomment-670620880 **[Test build #127207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127207/testReport)** for PR 28846 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #28846: URL: https://github.com/apache/spark/pull/28846#issuecomment-670620998 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28846: URL: https://github.com/apache/spark/pull/28846#issuecomment-670620998 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] SparkQA commented on pull request #29034: [SPARK-32219][SQL] Add SHOW CACHED TABLES Command

2020-08-07 Thread GitBox
SparkQA commented on pull request #29034: URL: https://github.com/apache/spark/pull/29034#issuecomment-670621484 **[Test build #127206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127206/testReport)** for PR 29034 at commit

[GitHub] [spark] SparkQA commented on pull request #29384: [SPARK-32564][SQL][TEST] Inject data statistics to simulate plan generation on actual TPCDS data

2020-08-07 Thread GitBox
SparkQA commented on pull request #29384: URL: https://github.com/apache/spark/pull/29384#issuecomment-670624164 **[Test build #127205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127205/testReport)** for PR 29384 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29384: [SPARK-32564][SQL][TEST] Inject data statistics to simulate plan generation on actual TPCDS data

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29384: URL: https://github.com/apache/spark/pull/29384#issuecomment-670624957 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] karuppayya commented on a change in pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
karuppayya commented on a change in pull request #28804: URL: https://github.com/apache/spark/pull/28804#discussion_r467172474 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ## @@ -680,6 +688,16 @@ case class

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670630622 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] SparkQA removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
SparkQA removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670629807 **[Test build #127210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127210/testReport)** for PR 28804 at commit

[GitHub] [spark] c21 commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
c21 commented on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-670633470 @cloud-fan, @maropu, @agrawaldevesh - addressed all comments, and the PR is ready for review again. Thanks. This

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670646425 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29074: URL: https://github.com/apache/spark/pull/29074#issuecomment-670646303 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29074: URL: https://github.com/apache/spark/pull/29074#issuecomment-670646303 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox
SparkQA commented on pull request #29386: URL: https://github.com/apache/spark/pull/29386#issuecomment-670654894 **[Test build #127215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127215/testReport)** for PR 29386 at commit

[GitHub] [spark] SparkQA commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-07 Thread GitBox
SparkQA commented on pull request #29331: URL: https://github.com/apache/spark/pull/29331#issuecomment-670658671 **[Test build #127208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127208/testReport)** for PR 29331 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29034: [SPARK-32219][SQL] Add SHOW CACHED TABLES Command

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29034: URL: https://github.com/apache/spark/pull/29034#issuecomment-670621801 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] SparkQA removed a comment on pull request #29034: [SPARK-32219][SQL] Add SHOW CACHED TABLES Command

2020-08-07 Thread GitBox
SparkQA removed a comment on pull request #29034: URL: https://github.com/apache/spark/pull/29034#issuecomment-670560686 **[Test build #127206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127206/testReport)** for PR 29034 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29034: [SPARK-32219][SQL] Add SHOW CACHED TABLES Command

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29034: URL: https://github.com/apache/spark/pull/29034#issuecomment-670621795 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29034: [SPARK-32219][SQL] Add SHOW CACHED TABLES Command

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29034: URL: https://github.com/apache/spark/pull/29034#issuecomment-670621795 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] SparkQA removed a comment on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-08-07 Thread GitBox
SparkQA removed a comment on pull request #28846: URL: https://github.com/apache/spark/pull/28846#issuecomment-670575829 **[Test build #127207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127207/testReport)** for PR 28846 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28846: URL: https://github.com/apache/spark/pull/28846#issuecomment-670621001 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] karuppayya commented on a change in pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
karuppayya commented on a change in pull request #28804: URL: https://github.com/apache/spark/pull/28804#discussion_r467172000 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ## @@ -353,4 +353,8 @@ object AggUtils {

[GitHub] [spark] karuppayya commented on a change in pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
karuppayya commented on a change in pull request #28804: URL: https://github.com/apache/spark/pull/28804#discussion_r467171812 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ## @@ -409,6 +411,12 @@ case class

[GitHub] [spark] SparkQA removed a comment on pull request #29384: [SPARK-32564][SQL][TEST] Inject data statistics to simulate plan generation on actual TPCDS data

2020-08-07 Thread GitBox
SparkQA removed a comment on pull request #29384: URL: https://github.com/apache/spark/pull/29384#issuecomment-670500338 **[Test build #127205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127205/testReport)** for PR 29384 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-670630348 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670630615 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670630439 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670630439 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-670630348 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
SparkQA commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670630600 **[Test build #127210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127210/testReport)** for PR 28804 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670630615 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] c21 commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
c21 commented on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-670632581 @agrawaldevesh - thank you for warm welcome, and excited to discuss and collaborate again here! > I am curious if the approach of storing the 'matched rows' out of band was

[GitHub] [spark] SparkQA commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
SparkQA commented on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-670632896 **[Test build #127211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127211/testReport)** for PR 29342 at commit

[GitHub] [spark] SparkQA commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox
SparkQA commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-670642619 **[Test build #127212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127212/testReport)** for PR 29031 at commit

[GitHub] [spark] viirya commented on a change in pull request #29328: [SPARK-32516][SQL] 'path' option cannot co-exist with load()'s path parameters

2020-08-07 Thread GitBox
viirya commented on a change in pull request #29328: URL: https://github.com/apache/spark/pull/29328#discussion_r467199754 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -229,6 +229,7 @@ class DataFrameReader private[sql](sparkSession:

[GitHub] [spark] AmplabJenkins commented on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29386: URL: https://github.com/apache/spark/pull/29386#issuecomment-670655440 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29386: URL: https://github.com/apache/spark/pull/29386#issuecomment-670655440 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] karuppayya commented on a change in pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
karuppayya commented on a change in pull request #28804: URL: https://github.com/apache/spark/pull/28804#discussion_r467173370 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ## @@ -353,4 +353,8 @@ object AggUtils {

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29384: [SPARK-32564][SQL][TEST] Inject data statistics to simulate plan generation on actual TPCDS data

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29384: URL: https://github.com/apache/spark/pull/29384#issuecomment-670624957 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] karuppayya commented on a change in pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
karuppayya commented on a change in pull request #28804: URL: https://github.com/apache/spark/pull/28804#discussion_r467175169 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ## @@ -680,6 +688,16 @@ case class

[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
SparkQA commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670629807 **[Test build #127210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127210/testReport)** for PR 28804 at commit

[GitHub] [spark] srowen commented on a change in pull request #29383: [SPARK-31703][SQL] Parquet RLE float/double are read incorrectly on big endian platforms

2020-08-07 Thread GitBox
srowen commented on a change in pull request #29383: URL: https://github.com/apache/spark/pull/29383#discussion_r467178364 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ## @@ -425,6 +425,19 @@ public void putFloats(int

[GitHub] [spark] c21 commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
c21 commented on a change in pull request #29342: URL: https://github.com/apache/spark/pull/29342#discussion_r467157300 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -97,7 +102,9 @@ private[execution] object

[GitHub] [spark] c21 edited a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
c21 edited a comment on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-670632581 @agrawaldevesh - thank you for warm welcome, and excited to discuss and collaborate again here! > I am curious if the approach of storing the 'matched rows' out of

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-670643137 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-670643137 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-08-07 Thread GitBox
SparkQA commented on pull request #29074: URL: https://github.com/apache/spark/pull/29074#issuecomment-670645783 **[Test build #127213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127213/testReport)** for PR 29074 at commit

[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
SparkQA commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670645866 **[Test build #127214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127214/testReport)** for PR 28804 at commit

[GitHub] [spark] viirya commented on pull request #29385: [SPARK-32191][PySpark][DOC] Migration Guide for PySpark docs

2020-08-07 Thread GitBox
viirya commented on pull request #29385: URL: https://github.com/apache/spark/pull/29385#issuecomment-670646380 cc @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670646425 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun opened a new pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox
dongjoon-hyun opened a new pull request #29386: URL: https://github.com/apache/spark/pull/29386 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] dongjoon-hyun commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre and Hadoop to 3.2.1

2020-08-07 Thread GitBox
dongjoon-hyun commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-670656860 Thank you so much. Yes. I'm looking forward to seeing that~ This is an automated message from the Apache

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox
dongjoon-hyun commented on a change in pull request #29386: URL: https://github.com/apache/spark/pull/29386#discussion_r467209211 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala ## @@ -395,7 +395,7 @@ class

[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670659760 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29331: URL: https://github.com/apache/spark/pull/29331#issuecomment-670659548 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
SparkQA commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670659564 **[Test build #127214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127214/testReport)** for PR 28804 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670659760 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] SparkQA removed a comment on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-07 Thread GitBox
SparkQA removed a comment on pull request #29331: URL: https://github.com/apache/spark/pull/29331#issuecomment-670596918 **[Test build #127208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127208/testReport)** for PR 29331 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29331: URL: https://github.com/apache/spark/pull/29331#issuecomment-670659548 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670659771 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] SparkQA removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
SparkQA removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670645866 **[Test build #127214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127214/testReport)** for PR 28804 at commit

[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
viirya commented on a change in pull request #29342: URL: https://github.com/apache/spark/pull/29342#discussion_r467244022 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -314,7 +343,13 @@ private[joins] object

[GitHub] [spark] Fokko commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox
Fokko commented on pull request #29121: URL: https://github.com/apache/spark/pull/29121#issuecomment-670690427 Would it be possible to move this forward? :) This is an automated message from the Apache Git Service. To

[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
viirya commented on a change in pull request #29342: URL: https://github.com/apache/spark/pull/29342#discussion_r467244022 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -314,7 +343,13 @@ private[joins] object

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670694952 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670694952 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
viirya commented on a change in pull request #29342: URL: https://github.com/apache/spark/pull/29342#discussion_r467252553 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -71,8 +88,122 @@ case class

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29121: URL: https://github.com/apache/spark/pull/29121#issuecomment-670705663 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670707728 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670707728 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670707738 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
SparkQA commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670707640 **[Test build #127216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127216/testReport)** for PR 28804 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
SparkQA removed a comment on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670694343 **[Test build #127216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127216/testReport)** for PR 28804 at commit

[GitHub] [spark] SparkQA commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox
SparkQA commented on pull request #29121: URL: https://github.com/apache/spark/pull/29121#issuecomment-670707796 **[Test build #127217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127217/testReport)** for PR 29121 at commit

[GitHub] [spark] SparkQA commented on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox
SparkQA commented on pull request #29386: URL: https://github.com/apache/spark/pull/29386#issuecomment-670714635 **[Test build #127215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127215/testReport)** for PR 29386 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox
SparkQA removed a comment on pull request #29386: URL: https://github.com/apache/spark/pull/29386#issuecomment-670654894 **[Test build #127215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127215/testReport)** for PR 29386 at commit

[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
SparkQA commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-670694343 **[Test build #127216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127216/testReport)** for PR 28804 at commit

[GitHub] [spark] karuppayya commented on a change in pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox
karuppayya commented on a change in pull request #28804: URL: https://github.com/apache/spark/pull/28804#discussion_r467173370 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala ## @@ -353,4 +353,8 @@ object AggUtils {

[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
viirya commented on a change in pull request #29342: URL: https://github.com/apache/spark/pull/29342#discussion_r467250293 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -71,8 +88,122 @@ case class

[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
viirya commented on a change in pull request #29342: URL: https://github.com/apache/spark/pull/29342#discussion_r467250468 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -71,8 +88,122 @@ case class

[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox
viirya commented on a change in pull request #29342: URL: https://github.com/apache/spark/pull/29342#discussion_r467251404 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -71,8 +88,122 @@ case class

[GitHub] [spark] allisonwang-db commented on pull request #29137: [SPARK-32337][SQL] Show initial plan in AQE plan tree string

2020-08-07 Thread GitBox
allisonwang-db commented on pull request #29137: URL: https://github.com/apache/spark/pull/29137#issuecomment-670699341 I've updated the PR description. Please let me know if it makes sense. This is an automated message from

[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox
dongjoon-hyun commented on pull request #29121: URL: https://github.com/apache/spark/pull/29121#issuecomment-670705302 Retest this please. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29121: URL: https://github.com/apache/spark/pull/29121#issuecomment-670705663 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] mridulm commented on pull request #24554: [SPARK-27622][Core] Avoiding the network when block manager fetches disk persisted RDD blocks from the same host

2020-08-07 Thread GitBox
mridulm commented on pull request #24554: URL: https://github.com/apache/spark/pull/24554#issuecomment-670710590 Catching up on PR's ... this essentially means all executors on same host have effectively same preferred locality (modulo concurrent block removal) - did we update the

[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox
srowen commented on pull request #29121: URL: https://github.com/apache/spark/pull/29121#issuecomment-670711092 My last comment was, why do we need to add the rule and then a ton of exclusions? just remove the unused imports. That's a much narrower change

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29386: URL: https://github.com/apache/spark/pull/29386#issuecomment-670715442 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29386: URL: https://github.com/apache/spark/pull/29386#issuecomment-670715442 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox
SparkQA removed a comment on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-670642619 **[Test build #127212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127212/testReport)** for PR 29031 at commit

[GitHub] [spark] SparkQA commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox
SparkQA commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-670771944 **[Test build #127212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127212/testReport)** for PR 29031 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox
AmplabJenkins removed a comment on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-670772420 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox
AmplabJenkins commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-670772420 This is an automated message from the Apache Git Service. To respond to the message, please log on to

<    1   2   3   4   5   >