[GitHub] [spark] maropu commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-07-30 Thread GitBox
maropu commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r462681112 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,306 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] SparkQA removed a comment on pull request #29296: [SPARK-32488][SQL] Use @parser::members and @lexer::members to avoid generating unused code

2020-07-30 Thread GitBox
SparkQA removed a comment on pull request #29296: URL: https://github.com/apache/spark/pull/29296#issuecomment-666055382 **[Test build #126800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126800/testReport)** for PR 29296 at commit

[GitHub] [spark] jiangxb1987 commented on a change in pull request #29276: [SPARK-32470][CORE] Remove task result size check for shuffle map stage

2020-07-30 Thread GitBox
jiangxb1987 commented on a change in pull request #29276: URL: https://github.com/apache/spark/pull/29276#discussion_r462572658 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ## @@ -695,7 +696,7 @@ private[spark] class TaskSetManager( def

[GitHub] [spark] AmplabJenkins commented on pull request #29295: [SPARK-32248][BUILD] Recover Java 11 build in Github Actions

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29295: URL: https://github.com/apache/spark/pull/29295#issuecomment-665995022 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk commented on a change in pull request #20176: [SPARK-22981][SQL] Fix incorrect results of Casting Struct to String

2020-07-30 Thread GitBox
MaxGekk commented on a change in pull request #20176: URL: https://github.com/apache/spark/pull/20176#discussion_r462546349 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -259,6 +259,29 @@ case class Cast(child:

[GitHub] [spark] yaooqinn commented on pull request #29204: [SPARK-32412][SQL] Unify error handling for spark thrift server operations

2020-07-30 Thread GitBox
yaooqinn commented on pull request #29204: URL: https://github.com/apache/spark/pull/29204#issuecomment-666152601 gentle ping @cloud-fan This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] WeichenXu123 commented on a change in pull request #29284: [SPARK-32479][PYSPARK] Fix the slicing logic in createDataFrame when converting pandas dataframe to arrow table

2020-07-30 Thread GitBox
WeichenXu123 commented on a change in pull request #29284: URL: https://github.com/apache/spark/pull/29284#discussion_r462672906 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -404,8 +404,10 @@ def _create_from_pandas_with_arrow(self, pdf, schema, timezone):

[GitHub] [spark] SparkQA removed a comment on pull request #29278: [SPARK-32160][CORE][PYSPARK] Add configs to switch allow/disallow to create SparkContext in executors.

2020-07-30 Thread GitBox
SparkQA removed a comment on pull request #29278: URL: https://github.com/apache/spark/pull/29278#issuecomment-665970816 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR commented on pull request #29289: [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests

2020-07-30 Thread GitBox
HeartSaVioR commented on pull request #29289: URL: https://github.com/apache/spark/pull/29289#issuecomment-665989611 I'll leave the PR a day to see any further input, and merge tomorrow. Please take a look, or leave comment if anyone needs some time to review this.

[GitHub] [spark] SparkQA commented on pull request #29276: [SPARK-32470][CORE] Remove task result size check for shuffle map stage

2020-07-30 Thread GitBox
SparkQA commented on pull request #29276: URL: https://github.com/apache/spark/pull/29276#issuecomment-665902742 **[Test build #126786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126786/testReport)** for PR 29276 at commit

[GitHub] [spark] HyukjinKwon commented on pull request #29283: [SPARK-32478][R][SQL] Error message to show the schema mismatch in gapply with Arrow vectorization

2020-07-30 Thread GitBox
HyukjinKwon commented on pull request #29283: URL: https://github.com/apache/spark/pull/29283#issuecomment-666146832 Merged to master and branch-3.0. Thanks @viirya. This is an automated message from the Apache Git Service.

[GitHub] [spark] AmplabJenkins commented on pull request #29293: [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29293: URL: https://github.com/apache/spark/pull/29293#issuecomment-665942782 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-30 Thread GitBox
HyukjinKwon commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-666017525 Merged to master. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29146: [WIP][SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-665946574 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on pull request #29293: [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource

2020-07-30 Thread GitBox
dongjoon-hyun commented on pull request #29293: URL: https://github.com/apache/spark/pull/29293#issuecomment-665942644 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] viirya commented on a change in pull request #29277: [SPARK-32421][SQL] Add code-gen for shuffled hash join

2020-07-30 Thread GitBox
viirya commented on a change in pull request #29277: URL: https://github.com/apache/spark/pull/29277#discussion_r462685419 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ## @@ -903,6 +904,10 @@ case class

[GitHub] [spark] gemelen commented on pull request #29286: [WIP}[SPARK-21708][Build] Migrate build to sbt 1.x

2020-07-30 Thread GitBox
gemelen commented on pull request #29286: URL: https://github.com/apache/spark/pull/29286#issuecomment-666101286 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] SparkQA commented on pull request #29293: [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource

2020-07-30 Thread GitBox
SparkQA commented on pull request #29293: URL: https://github.com/apache/spark/pull/29293#issuecomment-665941955 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] AmplabJenkins commented on pull request #29146: [WIP][SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-665946574 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #29296: [SPARK-32488][SQL] Use @parser::members and @lexer::members to avoid generating unused code

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29296: URL: https://github.com/apache/spark/pull/29296#issuecomment-666050019 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29234: [SPARK-32431][SQL] Check duplicate nested columns in read from in-built datasources

2020-07-30 Thread GitBox
SparkQA commented on pull request #29234: URL: https://github.com/apache/spark/pull/29234#issuecomment-665941972 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] SparkQA removed a comment on pull request #29294: [SPARK-32160][CORE][PYSPARK][3.0] Add configs to switch allow/disallow to create SparkContext in executors.

2020-07-30 Thread GitBox
SparkQA removed a comment on pull request #29294: URL: https://github.com/apache/spark/pull/29294#issuecomment-665989295 **[Test build #126794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126794/testReport)** for PR 29294 at commit

[GitHub] [spark] HeartSaVioR commented on pull request #29272: [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests

2020-07-30 Thread GitBox
HeartSaVioR commented on pull request #29272: URL: https://github.com/apache/spark/pull/29272#issuecomment-666223018 retest this, please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-666231820 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins commented on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-666231820 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126802/

[GitHub] [spark] beliefer commented on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
beliefer commented on pull request #29291: URL: https://github.com/apache/spark/pull/29291#issuecomment-666240549 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] itsvikramagr edited a comment on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-30 Thread GitBox
itsvikramagr edited a comment on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-666248758 @HeartSaVioR - This is a much-needed fix. Thanks for it. I have an orthogonal question. Why do we need to worry about file sink metadata files? I can think of

[GitHub] [spark] HyukjinKwon opened a new pull request #29306: [SPARK-32497][INFRA] Installs qpdf package for CRAN check in GitHub Actions

2020-07-30 Thread GitBox
HyukjinKwon opened a new pull request #29306: URL: https://github.com/apache/spark/pull/29306 ### What changes were proposed in this pull request? CRAN check fails due to the size of PDF manual as below: {code} ... WARNING ‘qpdf’ is needed for checks on size

[GitHub] [spark] cloud-fan commented on a change in pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
cloud-fan commented on a change in pull request #29291: URL: https://github.com/apache/spark/pull/29291#discussion_r462958983 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -184,8 +227,8 @@ object

[GitHub] [spark] beliefer commented on pull request #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2020-07-30 Thread GitBox
beliefer commented on pull request #27429: URL: https://github.com/apache/spark/pull/27429#issuecomment-666227176 retest this please. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-666231024 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-666231024 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] liangz1 commented on pull request #29284: [SPARK-32479][PYSPARK] Fix the slicing logic in createDataFrame when converting pandas dataframe to arrow table

2020-07-30 Thread GitBox
liangz1 commented on pull request #29284: URL: https://github.com/apache/spark/pull/29284#issuecomment-666238258 This is not a bug. Spark will always create `defaultParallelism` partitions; there could be empty partitions. Closing this PR.

[GitHub] [spark] AmplabJenkins commented on pull request #29296: [SPARK-32488][SQL] Use @parser::members and @lexer::members to avoid generating unused code

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29296: URL: https://github.com/apache/spark/pull/29296#issuecomment-666254919 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29296: [SPARK-32488][SQL] Use @parser::members and @lexer::members to avoid generating unused code

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29296: URL: https://github.com/apache/spark/pull/29296#issuecomment-666254919 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #29296: [SPARK-32488][SQL] Use @parser::members and @lexer::members to avoid generating unused code

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29296: URL: https://github.com/apache/spark/pull/29296#issuecomment-666255678 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126800/

[GitHub] [spark] maropu commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
maropu commented on a change in pull request #29146: URL: https://github.com/apache/spark/pull/29146#discussion_r462884250 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -244,11 +258,31 @@ statement | SET TIME ZONE

[GitHub] [spark] maropu commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
maropu commented on a change in pull request #29146: URL: https://github.com/apache/spark/pull/29146#discussion_r462883994 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala ## @@ -61,6 +63,64 @@ class SparkSqlParserSuite extends

[GitHub] [spark] maropu commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
maropu commented on a change in pull request #29146: URL: https://github.com/apache/spark/pull/29146#discussion_r462884139 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala ## @@ -61,6 +63,64 @@ class SparkSqlParserSuite extends

[GitHub] [spark] maropu commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
maropu commented on a change in pull request #29146: URL: https://github.com/apache/spark/pull/29146#discussion_r462888722 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -962,8 +962,8 @@ class SQLQuerySuite extends QueryTest with

[GitHub] [spark] HyukjinKwon opened a new pull request #29302: [SPARK-32493][INFRA] Manually install R instead of using setup-r in GitHub Actions

2020-07-30 Thread GitBox
HyukjinKwon opened a new pull request #29302: URL: https://github.com/apache/spark/pull/29302 ### What changes were proposed in this pull request? This PR proposes to manually install R instead of using `setup-r` which seems broken. Currently, GitHub Actions uses its default R 3.4.4

[GitHub] [spark] yaooqinn opened a new pull request #29303: [SPARK-32492][SQL] Fulfill missing column meta information for thriftserver client tools

2020-07-30 Thread GitBox
yaooqinn opened a new pull request #29303: URL: https://github.com/apache/spark/pull/29303 ### What changes were proposed in this pull request? This PR fulfills some missing fields for SparkGetColumnsOperation ### Why are the changes needed? make jdbc tools

[GitHub] [spark] cloud-fan commented on pull request #29199: [SPARK-32403][SQL] Refactor current ScriptTransformationExec

2020-07-30 Thread GitBox
cloud-fan commented on pull request #29199: URL: https://github.com/apache/spark/pull/29199#issuecomment-666293972 > add default no serde IO schemas ScriptTransformationIOSchema.defaultIOSchema I think we will have a default native serde. So for now we just need a fake one which

[GitHub] [spark] cloud-fan commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-30 Thread GitBox
cloud-fan commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-666293192 retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cloud-fan commented on a change in pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
cloud-fan commented on a change in pull request #29291: URL: https://github.com/apache/spark/pull/29291#discussion_r462957297 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -207,13 +256,35 @@ object

[GitHub] [spark] AmplabJenkins commented on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29291: URL: https://github.com/apache/spark/pull/29291#issuecomment-666222739 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] leanken commented on pull request #29301: [SPARK-32474][SQL][FOLLOWUP] NullAwareAntiJoin multi-column support

2020-07-30 Thread GitBox
leanken commented on pull request #29301: URL: https://github.com/apache/spark/pull/29301#issuecomment-666202715 @cloud-fan @maropu @agrawaldevesh Could you guys have a look at this follow up, See if is it worth to do such trade-off to support multi-column NAAJ.

[GitHub] [spark] liangz1 closed pull request #29284: [SPARK-32479][PYSPARK] Fix the slicing logic in createDataFrame when converting pandas dataframe to arrow table

2020-07-30 Thread GitBox
liangz1 closed pull request #29284: URL: https://github.com/apache/spark/pull/29284 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29296: [SPARK-32488][SQL] Use @parser::members and @lexer::members to avoid generating unused code

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29296: URL: https://github.com/apache/spark/pull/29296#issuecomment-666255678 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] cloud-fan commented on pull request #29301: [SPARK-32474][SQL][FOLLOWUP] NullAwareAntiJoin multi-column support

2020-07-30 Thread GitBox
cloud-fan commented on pull request #29301: URL: https://github.com/apache/spark/pull/29301#issuecomment-666264155 can you create a new jira ticket? It's a major feature that shouldn't be treated as a followup. This is an

[GitHub] [spark] HyukjinKwon commented on pull request #29300: [SPARK-32491][INFRA] Do not install SparkR in test-only mode in testing script

2020-07-30 Thread GitBox
HyukjinKwon commented on pull request #29300: URL: https://github.com/apache/spark/pull/29300#issuecomment-666284461 I am going to merge unblock other PRs. Jenkins seems down as well. This is an automated message from the

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

2020-07-30 Thread GitBox
uncleGen commented on pull request #28781: URL: https://github.com/apache/spark/pull/28781#issuecomment-666284455 retest this please. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #29300: [SPARK-32491][INFRA] Do not install SparkR in test-only mode in testing script

2020-07-30 Thread GitBox
HyukjinKwon commented on pull request #29300: URL: https://github.com/apache/spark/pull/29300#issuecomment-666284569 Merged to master. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon closed pull request #29300: [SPARK-32491][INFRA] Do not install SparkR in test-only mode in testing script

2020-07-30 Thread GitBox
HyukjinKwon closed pull request #29300: URL: https://github.com/apache/spark/pull/29300 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] beliefer commented on a change in pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
beliefer commented on a change in pull request #29291: URL: https://github.com/apache/spark/pull/29291#discussion_r462924626 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -144,28 +192,23 @@ import

[GitHub] [spark] beliefer commented on a change in pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
beliefer commented on a change in pull request #29291: URL: https://github.com/apache/spark/pull/29291#discussion_r462924626 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -144,28 +192,23 @@ import

[GitHub] [spark] HyukjinKwon commented on pull request #29305: [SPARK-32496][INFRA] Include GitHub Action file as the changes in testing

2020-07-30 Thread GitBox
HyukjinKwon commented on pull request #29305: URL: https://github.com/apache/spark/pull/29305#issuecomment-666325261 Seems like CRAN check started to fail due to SPARK-32497. I am going to merge to unblock the PRs - SparkR tests still fail. Let me know if there are any comments

[GitHub] [spark] HyukjinKwon commented on pull request #29305: [SPARK-32496][INFRA] Include GitHub Action file as the changes in testing

2020-07-30 Thread GitBox
HyukjinKwon commented on pull request #29305: URL: https://github.com/apache/spark/pull/29305#issuecomment-666325380 Merged to master. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon closed pull request #29305: [SPARK-32496][INFRA] Include GitHub Action file as the changes in testing

2020-07-30 Thread GitBox
HyukjinKwon closed pull request #29305: URL: https://github.com/apache/spark/pull/29305 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a change in pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
cloud-fan commented on a change in pull request #29291: URL: https://github.com/apache/spark/pull/29291#discussion_r462958305 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -144,28 +192,23 @@ import

[GitHub] [spark] cloud-fan commented on a change in pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
cloud-fan commented on a change in pull request #29291: URL: https://github.com/apache/spark/pull/29291#discussion_r462957711 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -144,28 +192,23 @@ import

[GitHub] [spark] beliefer commented on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all

2020-07-30 Thread GitBox
beliefer commented on pull request #27507: URL: https://github.com/apache/spark/pull/27507#issuecomment-666227396 retest this please. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29291: URL: https://github.com/apache/spark/pull/29291#issuecomment-666222739 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] beliefer commented on pull request #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2020-07-30 Thread GitBox
beliefer commented on pull request #27429: URL: https://github.com/apache/spark/pull/27429#issuecomment-666227176 retest this please. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] beliefer commented on pull request #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all

2020-07-30 Thread GitBox
beliefer commented on pull request #27507: URL: https://github.com/apache/spark/pull/27507#issuecomment-666227396 retest this please. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29283: [SPARK-32478][R][SQL] Error message to show the schema mismatch in gapply with Arrow vectorization

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29283: URL: https://github.com/apache/spark/pull/29283#issuecomment-666233540 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #29283: [SPARK-32478][R][SQL] Error message to show the schema mismatch in gapply with Arrow vectorization

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29283: URL: https://github.com/apache/spark/pull/29283#issuecomment-666233540 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] leanken commented on pull request #29301: [SPARK-32474][SQL][FOLLOWUP] NullAwareAntiJoin multi-column support

2020-07-30 Thread GitBox
leanken commented on pull request #29301: URL: https://github.com/apache/spark/pull/29301#issuecomment-666266144 PS. Need recreate JIRA and PR, close this one in advanced. This is an automated message from the Apache Git

[GitHub] [spark] leanken closed pull request #29301: [SPARK-32474][SQL][FOLLOWUP] NullAwareAntiJoin multi-column support

2020-07-30 Thread GitBox
leanken closed pull request #29301: URL: https://github.com/apache/spark/pull/29301 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #29302: [SPARK-32493][INFRA] Manually install R instead of using setup-r in GitHub Actions

2020-07-30 Thread GitBox
HyukjinKwon commented on pull request #29302: URL: https://github.com/apache/spark/pull/29302#issuecomment-666301002 Thanks @cloud-fan. I am going to merge to unlock other PRs. Merged to master. This is an automated

[GitHub] [spark] HyukjinKwon closed pull request #29302: [SPARK-32493][INFRA] Manually install R instead of using setup-r in GitHub Actions

2020-07-30 Thread GitBox
HyukjinKwon closed pull request #29302: URL: https://github.com/apache/spark/pull/29302 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] beliefer commented on a change in pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
beliefer commented on a change in pull request #29291: URL: https://github.com/apache/spark/pull/29291#discussion_r462925756 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -207,13 +256,35 @@ object

[GitHub] [spark] maropu commented on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
maropu commented on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-666317097 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] AmplabJenkins commented on pull request #29283: [SPARK-32478][R][SQL] Error message to show the schema mismatch in gapply with Arrow vectorization

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29283: URL: https://github.com/apache/spark/pull/29283#issuecomment-666234409 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126798/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29283: [SPARK-32478][R][SQL] Error message to show the schema mismatch in gapply with Arrow vectorization

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29283: URL: https://github.com/apache/spark/pull/29283#issuecomment-666234409 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] leanken commented on pull request #29301: [SPARK-32474][SQL][FOLLOWUP] NullAwareAntiJoin multi-column support

2020-07-30 Thread GitBox
leanken commented on pull request #29301: URL: https://github.com/apache/spark/pull/29301#issuecomment-666265832 > can you create a new jira ticket? It's a major feature that shouldn't be treated as a followup. OK, I will re-create JIRA Issue and PR.

[GitHub] [spark] leanken edited a comment on pull request #29301: [SPARK-32474][SQL][FOLLOWUP] NullAwareAntiJoin multi-column support

2020-07-30 Thread GitBox
leanken edited a comment on pull request #29301: URL: https://github.com/apache/spark/pull/29301#issuecomment-666265832 > can you create a new jira ticket? It's a major feature that shouldn't be treated as a followup. OK, I will re-create JIRA Issue and PR. Close this PR first

[GitHub] [spark] maropu commented on a change in pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
maropu commented on a change in pull request #29146: URL: https://github.com/apache/spark/pull/29146#discussion_r462884875 ## File path: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfEntrySuite.scala ## @@ -107,7 +107,7 @@ class SQLConfEntrySuite extends

[GitHub] [spark] cloud-fan commented on a change in pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
cloud-fan commented on a change in pull request #29291: URL: https://github.com/apache/spark/pull/29291#discussion_r462917670 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -207,13 +256,35 @@ object

[GitHub] [spark] cloud-fan commented on a change in pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-07-30 Thread GitBox
cloud-fan commented on a change in pull request #29067: URL: https://github.com/apache/spark/pull/29067#discussion_r462960267 ## File path: sql/core/src/main/scala/org/apache/spark/sql/columnar/CachedBatchSerializer.scala ## @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache

[GitHub] [spark] c21 commented on a change in pull request #29277: [SPARK-32421][SQL] Add code-gen for shuffled hash join

2020-07-30 Thread GitBox
c21 commented on a change in pull request #29277: URL: https://github.com/apache/spark/pull/29277#discussion_r462830406 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala ## @@ -70,4 +74,54 @@ case class ShuffledHashJoinExec(

[GitHub] [spark] itsvikramagr commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-30 Thread GitBox
itsvikramagr commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-666248758 @HeartSaVioR - This is a much-needed fix. Thanks for it. I have an orthogonal question. Why do we need to worry about compacting the file sink metadata? I can think

[GitHub] [spark] HyukjinKwon commented on pull request #29300: [SPARK-32491][INFRA] Do not install SparkR in test-only mode in testing script

2020-07-30 Thread GitBox
HyukjinKwon commented on pull request #29300: URL: https://github.com/apache/spark/pull/29300#issuecomment-666268399 retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] leanken opened a new pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-07-30 Thread GitBox
leanken opened a new pull request #29304: URL: https://github.com/apache/spark/pull/29304 ### What changes were proposed in this pull request? In this PR, proposed a trade-off that can also support multi column to perform hash lookup in buildSide, but required buildSide with extra

[GitHub] [spark] cloud-fan commented on pull request #29277: [SPARK-32421][SQL] Add code-gen for shuffled hash join

2020-07-30 Thread GitBox
cloud-fan commented on pull request #29277: URL: https://github.com/apache/spark/pull/29277#issuecomment-666290685 retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cloud-fan commented on a change in pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-30 Thread GitBox
cloud-fan commented on a change in pull request #29291: URL: https://github.com/apache/spark/pull/29291#discussion_r462915233 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -144,28 +192,23 @@ import

[GitHub] [spark] HyukjinKwon opened a new pull request #29305: Include GitHub Action file as the changes in testing

2020-07-30 Thread GitBox
HyukjinKwon opened a new pull request #29305: URL: https://github.com/apache/spark/pull/29305 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/26556 excluded `.github/workflows/master.yml`. So tests are skipped if the GitHub Actions

[GitHub] [spark] leanken commented on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-07-30 Thread GitBox
leanken commented on pull request #29304: URL: https://github.com/apache/spark/pull/29304#issuecomment-666310924 @cloud-fan @maropu @agrawaldevesh This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] leanken edited a comment on pull request #29304: [SPARK-32494][SQL] Null Aware Anti Join Optimize Support Multi-Column

2020-07-30 Thread GitBox
leanken edited a comment on pull request #29304: URL: https://github.com/apache/spark/pull/29304#issuecomment-666310924 @cloud-fan @maropu @agrawaldevesh New JIRA and PR re-created. Many thanks. This is an automated

[GitHub] [spark] HeartSaVioR edited a comment on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-30 Thread GitBox
HeartSaVioR edited a comment on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-666770668 > for exactly-once semantics, we can add make changes in ManifestFileCommitter to delete files in the abort function. Or we can come up with some other alternatives.

[GitHub] [spark] HeartSaVioR edited a comment on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-30 Thread GitBox
HeartSaVioR edited a comment on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-666770668 > for exactly-once semantics, we can add make changes in ManifestFileCommitter to delete files in the abort function. Or we can come up with some other alternatives.

[GitHub] [spark] SparkQA commented on pull request #29279: [SPARK-31418][CORE][FOLLOW-UP][MINOR] Fix log messages to print stage id instead of the object name

2020-07-30 Thread GitBox
SparkQA commented on pull request #29279: URL: https://github.com/apache/spark/pull/29279#issuecomment-666807231 **[Test build #126811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126811/testReport)** for PR 29279 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29146: [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29146: URL: https://github.com/apache/spark/pull/29146#issuecomment-666845339 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-666849196 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] SparkQA removed a comment on pull request #29278: [SPARK-32160][CORE][PYSPARK] Add a config to switch allow/disallow to create SparkContext in executors.

2020-07-30 Thread GitBox
SparkQA removed a comment on pull request #29278: URL: https://github.com/apache/spark/pull/29278#issuecomment-56234 **[Test build #126812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126812/testReport)** for PR 29278 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-666849810 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-07-30 Thread GitBox
AmplabJenkins commented on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-666849810 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29278: [SPARK-32160][CORE][PYSPARK] Add a config to switch allow/disallow to create SparkContext in executors.

2020-07-30 Thread GitBox
SparkQA commented on pull request #29278: URL: https://github.com/apache/spark/pull/29278#issuecomment-666849737 **[Test build #126812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126812/testReport)** for PR 29278 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-666849192 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29313: [DO-NOT-MERGE] Debug the flaky test added in SPARK-32175

2020-07-30 Thread GitBox
AmplabJenkins removed a comment on pull request #29313: URL: https://github.com/apache/spark/pull/29313#issuecomment-666736981 This is an automated message from the Apache Git Service. To respond to the message, please log on

<    1   2   3   4   5   6   7   8   >