[GitHub] [spark] AmplabJenkins removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659337425 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29021: [SPARK-32201][SQL] More general skew join pattern matching

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659337425 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-16 Thread GitBox
cloud-fan commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r455703428 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2665,6 +2665,15 @@ object SQLConf { .checkValue(_

[GitHub] [spark] cloud-fan commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-16 Thread GitBox
cloud-fan commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r455709347 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -60,6 +63,81 @@ case class

[GitHub] [spark] maropu edited a comment on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-16 Thread GitBox
maropu edited a comment on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-659340965 > Would that alleviate your concern about SPIP @maropu? Yea, the approach that you suggested sounds reasonable to me. Thanks for sum up, @HyukjinKwon .

[GitHub] [spark] maropu commented on pull request #29126: [SPARK-32324][SQL]Fix error messages during using PIVOT and lateral view

2020-07-16 Thread GitBox
maropu commented on pull request #29126: URL: https://github.com/apache/spark/pull/29126#issuecomment-659356390 > We have the following defined in spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4. but it seems pivotClause and lateralView need to replace

[GitHub] [spark] cloud-fan commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox
cloud-fan commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r455724248 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala ## @@ -103,46 +119,69 @@ class

[GitHub] [spark] SparkQA commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-16 Thread GitBox
SparkQA commented on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-659366838 **[Test build #125952 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125952/testReport)** for PR 29101 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-659365254 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-659380036 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-659380022 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659384990 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659384990 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-16 Thread GitBox
SparkQA commented on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-659385530 **[Test build #125954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125954/testReport)** for PR 29101 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #28676: URL: https://github.com/apache/spark/pull/28676#issuecomment-659390968 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-659390949 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659390541 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659380293 **[Test build #125970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125970/testReport)** for PR 29117 at commit

[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659390505 **[Test build #125970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125970/testReport)** for PR 29117 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-16 Thread GitBox
SparkQA removed a comment on pull request #28676: URL: https://github.com/apache/spark/pull/28676#issuecomment-659203533 **[Test build #125947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125947/testReport)** for PR 28676 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659390541 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-659391012 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #27030: [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #27030: URL: https://github.com/apache/spark/pull/27030#issuecomment-659398213 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659413183 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #29132: [SPARK-32331][SQL] Keep advanced statistics when pruning partitions

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29132: URL: https://github.com/apache/spark/pull/29132#issuecomment-659413396 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659417333 **[Test build #125980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125980/testReport)** for PR 29117 at commit

[GitHub] [spark] tgravescs commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-16 Thread GitBox
tgravescs commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-659430254 thanks everyone for the feedback. @HyukjinKwon I assume you and @maropu will file those jiras as it appears you have the most context there?

[GitHub] [spark] gaborgsomogyi commented on pull request #29131: [SPARK-32321][SS] Remove KAFKA-7703 workaround

2020-07-16 Thread GitBox
gaborgsomogyi commented on pull request #29131: URL: https://github.com/apache/spark/pull/29131#issuecomment-659430435 > we cannot reasonably support users who downgrade the kafka client Definitely not since it may bring in old issues.

[GitHub] [spark] tgravescs commented on a change in pull request #28874: [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting

2020-07-16 Thread GitBox
tgravescs commented on a change in pull request #28874: URL: https://github.com/apache/spark/pull/28874#discussion_r455808220 ## File path: python/pyspark/cloudpickle.py ## @@ -87,8 +87,8 @@ PY2 = True PY2_WRAPPER_DESCRIPTOR_TYPE = type(object.__init__)

[GitHub] [spark] AmplabJenkins commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-659432193 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] revans2 commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-16 Thread GitBox
revans2 commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-659444289 I know that the arrow work was kind of dropped. Please let me know what you want to hand off to me, or if we should have a meeting and figure out what we want to do. I have a

[GitHub] [spark] AmplabJenkins commented on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659340770 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] ulysses-you commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-16 Thread GitBox
ulysses-you commented on a change in pull request #28840: URL: https://github.com/apache/spark/pull/28840#discussion_r455706552 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala ## @@ -236,6 +236,46 @@ case class

[GitHub] [spark] maropu edited a comment on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-16 Thread GitBox
maropu edited a comment on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-659340965 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] maropu edited a comment on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-16 Thread GitBox
maropu edited a comment on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-659340965 > Would that alleviate your concern about SPIP @maropu? Yea, the descison as @HyukjinKwon suggested sounds reasonable to me.

[GitHub] [spark] maropu edited a comment on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-16 Thread GitBox
maropu edited a comment on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-659340965 > Would that alleviate your concern about SPIP @maropu? Yea, the approach that you suggested sounds reasonable to me.

[GitHub] [spark] cloud-fan commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox
cloud-fan commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r455723207 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala ## @@ -19,17 +19,21 @@ package

[GitHub] [spark] HyukjinKwon commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-16 Thread GitBox
HyukjinKwon commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-659363377 @BryanCutler, @viirya, @ueshin can you take a look when you're available? This is an automated message from

[GitHub] [spark] HyukjinKwon commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-16 Thread GitBox
HyukjinKwon commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-659363180 I just did a very quick test. Seems like if your function to serialize is big, it can benefit best from it up to 1250%: ```python >>> from pyspark import

[GitHub] [spark] SparkQA removed a comment on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-16 Thread GitBox
SparkQA removed a comment on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-659210212 **[Test build #125956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125956/testReport)** for PR 28904 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-659372181 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-16 Thread GitBox
SparkQA commented on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659372384 **[Test build #125967 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125967/testReport)** for PR 29021 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-659372181 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] GuoPhilipse commented on pull request #29126: [SPARK-32324][SQL]Fix error messages during using PIVOT and lateral view

2020-07-16 Thread GitBox
GuoPhilipse commented on pull request #29126: URL: https://github.com/apache/spark/pull/29126#issuecomment-659373259 > > We have the following defined in spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4. but it seems pivotClause and lateralView need to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-659375958 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] SparkQA commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-16 Thread GitBox
SparkQA commented on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-659377791 **[Test build #125958 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125958/testReport)** for PR 27366 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-16 Thread GitBox
SparkQA removed a comment on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-659217990 **[Test build #125958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125958/testReport)** for PR 27366 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-659382730 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-16 Thread GitBox
SparkQA commented on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-659381818 **[Test build #125953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125953/testReport)** for PR 29101 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-16 Thread GitBox
SparkQA removed a comment on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-659204337 **[Test build #125953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125953/testReport)** for PR 29101 at commit

[GitHub] [spark] SparkQA commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-07-16 Thread GitBox
SparkQA commented on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-659390230 **[Test build #125975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125975/testReport)** for PR 28422 at commit

[GitHub] [spark] SparkQA commented on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE

2020-07-16 Thread GitBox
SparkQA commented on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-659390233 **[Test build #125974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125974/testReport)** for PR 29064 at commit

[GitHub] [spark] SparkQA commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-16 Thread GitBox
SparkQA commented on pull request #28676: URL: https://github.com/apache/spark/pull/28676#issuecomment-659390329 **[Test build #125947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125947/testReport)** for PR 28676 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659391829 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-16 Thread GitBox
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-659392227 @cloud-fan / @dongjoon-hyun - The test build is failing with following error ``` Test build #125949 has finished for PR 29045 at commit c0f6209. This

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659395050 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659395050 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE

2020-07-16 Thread GitBox
cloud-fan commented on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-659395173 All github action checks passed, I think we are good to go. Thanks, merging to master! This is an automated

[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
SparkQA commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659407645 **[Test build #125976 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125976/testReport)** for PR 29117 at commit

[GitHub] [spark] maropu commented on pull request #29126: [SPARK-32324][SQL]Fix error messages during using PIVOT and lateral view

2020-07-16 Thread GitBox
maropu commented on pull request #29126: URL: https://github.com/apache/spark/pull/29126#issuecomment-659407047 > we may make it more clear in SqlBase.g4 Any idea for that? I personally that the message update this PR proposes looks okay now.

[GitHub] [spark] SparkQA commented on pull request #29131: [SPARK-32321][SS] Remove KAFKA-7703 workaround

2020-07-16 Thread GitBox
SparkQA commented on pull request #29131: URL: https://github.com/apache/spark/pull/29131#issuecomment-659407740 **[Test build #125978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125978/testReport)** for PR 29131 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659407719 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
SparkQA removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659394297 **[Test build #125976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125976/testReport)** for PR 29117 at commit

[GitHub] [spark] SparkQA commented on pull request #29132: [SPARK-32331][SQL] Keep advanced statistics when pruning partitions

2020-07-16 Thread GitBox
SparkQA commented on pull request #29132: URL: https://github.com/apache/spark/pull/29132#issuecomment-659412406 **[Test build #125979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125979/testReport)** for PR 29132 at commit

[GitHub] [spark] revans2 commented on a change in pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-16 Thread GitBox
revans2 commented on a change in pull request #29067: URL: https://github.com/apache/spark/pull/29067#discussion_r455805241 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -19,84 +19,301 @@ package

[GitHub] [spark] Ngone51 commented on a change in pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-16 Thread GitBox
Ngone51 commented on a change in pull request #29014: URL: https://github.com/apache/spark/pull/29014#discussion_r455809271 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala ## @@ -101,7 +101,13 @@ private[spark] trait TaskScheduler { /**

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-659432193 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659340770 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-659340827 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] cloud-fan commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-16 Thread GitBox
cloud-fan commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r455707587 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -60,6 +63,81 @@ case class

[GitHub] [spark] maropu commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-16 Thread GitBox
maropu commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-659340965 > Would that alleviate your concern about SPIP @maropu? Yea, the descison sounds reasonable to me. This

[GitHub] [spark] AmplabJenkins commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-659340827 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-16 Thread GitBox
cloud-fan commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r455713560 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala ## @@ -415,6 +417,216 @@ abstract class

[GitHub] [spark] cloud-fan commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-16 Thread GitBox
cloud-fan commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r455712732 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala ## @@ -415,6 +417,216 @@ abstract class

[GitHub] [spark] cloud-fan commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-16 Thread GitBox
cloud-fan commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r455716738 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -565,7 +565,7 @@ class

[GitHub] [spark] cloud-fan commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-16 Thread GitBox
cloud-fan commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r455720125 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2659,12 +2660,24 @@ object SQLConf {

[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-16 Thread GitBox
maropu commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r455731070 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/BaseScriptTransformationSuite.scala ## @@ -0,0 +1,227 @@ +/* + * Licensed to

[GitHub] [spark] SparkQA removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-16 Thread GitBox
SparkQA removed a comment on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-659203978 **[Test build #125949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125949/testReport)** for PR 29045 at commit

[GitHub] [spark] SparkQA commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-16 Thread GitBox
SparkQA commented on pull request #29125: URL: https://github.com/apache/spark/pull/29125#issuecomment-659365406 **[Test build #125965 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125965/testReport)** for PR 29125 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-659365245 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-16 Thread GitBox
SparkQA commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-659364715 **[Test build #125949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125949/testReport)** for PR 29045 at commit

[GitHub] [spark] SparkQA commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-16 Thread GitBox
SparkQA commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-659371463 **[Test build #125956 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125956/testReport)** for PR 28904 at commit

[GitHub] [spark] LantaoJin commented on pull request #29123: [SPARK-32283][CORE] Kryo should support multiple user registrators

2020-07-16 Thread GitBox
LantaoJin commented on pull request #29123: URL: https://github.com/apache/spark/pull/29123#issuecomment-659374776 cc @RotemShaul @vanzin @dongjoon-hyun This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-659378513 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] maropu commented on pull request #29126: [SPARK-32324][SQL]Fix error messages during using PIVOT and lateral view

2020-07-16 Thread GitBox
maropu commented on pull request #29126: URL: https://github.com/apache/spark/pull/29126#issuecomment-659378822 I don't dig into it though, probably the second case is matched in `querySpecification`:

[GitHub] [spark] SparkQA commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-16 Thread GitBox
SparkQA commented on pull request #28840: URL: https://github.com/apache/spark/pull/28840#issuecomment-659378696 **[Test build #125969 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125969/testReport)** for PR 28840 at commit

[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
HyukjinKwon commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659379654 retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AmplabJenkins commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-16 Thread GitBox
AmplabJenkins commented on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-659378513 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-16 Thread GitBox
SparkQA commented on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659384093 **[Test build #125950 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125950/testReport)** for PR 29021 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-16 Thread GitBox
SparkQA removed a comment on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659203815 **[Test build #125950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125950/testReport)** for PR 29021 at commit

[GitHub] [spark] HyukjinKwon commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-07-16 Thread GitBox
HyukjinKwon commented on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-659388927 retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] gaborgsomogyi opened a new pull request #29131: [SPARK-32321][SS] Remove KAFKA-7703 workaround

2020-07-16 Thread GitBox
gaborgsomogyi opened a new pull request #29131: URL: https://github.com/apache/spark/pull/29131 ### What changes were proposed in this pull request? [KAFKA-7703](https://issues.apache.org/jira/browse/KAFKA-7703) has been discovered and a workaround has been added in SPARK-26267. At that

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-659413183 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29132: [SPARK-32331][SQL] Keep advanced statistics when pruning partitions

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29132: URL: https://github.com/apache/spark/pull/29132#issuecomment-659413396 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-16 Thread GitBox
SparkQA commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-659414366 **[Test build #125964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125964/testReport)** for PR 29032 at commit

[GitHub] [spark] revans2 commented on a change in pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-16 Thread GitBox
revans2 commented on a change in pull request #29067: URL: https://github.com/apache/spark/pull/29067#discussion_r455803871 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -19,84 +19,301 @@ package

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-16 Thread GitBox
AmplabJenkins removed a comment on pull request #29128: URL: https://github.com/apache/spark/pull/29128#issuecomment-659156074 Can one of the admins verify this patch? This is an automated message from the Apache Git

[GitHub] [spark] tgravescs commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's bl

2020-07-16 Thread GitBox
tgravescs commented on pull request #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-659428051 Sorry that failure might be related to https://issues.apache.org/jira/browse/SPARK-32287 I kicked it again

[GitHub] [spark] SparkQA commented on pull request #29131: [SPARK-32321][SS] Remove KAFKA-7703 workaround

2020-07-16 Thread GitBox
SparkQA commented on pull request #29131: URL: https://github.com/apache/spark/pull/29131#issuecomment-659435141 **[Test build #125978 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125978/testReport)** for PR 29131 at commit

[GitHub] [spark] LantaoJin removed a comment on pull request #29021: [SPARK-32201][SQL] More general skew join pattern matching

2020-07-16 Thread GitBox
LantaoJin removed a comment on pull request #29021: URL: https://github.com/apache/spark/pull/29021#issuecomment-659245767 Now I add another test case which is very similar with the user case in the description. I think it's done. Could you have a chance to review it? @cloud-fan

<    1   2   3   4   5   6   7   8   9   10   >