[GitHub] [spark] cloud-fan commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-18 Thread GitBox
cloud-fan commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1026161789 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala: ## @@ -31,20 +34,35 @@ object EliminateResolvedHint extends

[GitHub] [spark] viirya commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
viirya commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026176496 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -192,6 +192,23 @@ class SparkSessionExtensionSuite extends SparkFunSuite with

[GitHub] [spark] fred-db commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-18 Thread GitBox
fred-db commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1026135835 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala: ## @@ -31,20 +34,35 @@ object EliminateResolvedHint extends

[GitHub] [spark] Yikun closed pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-18 Thread GitBox
Yikun closed pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI URL: https://github.com/apache/spark/pull/38611 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] viirya commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
viirya commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026173073 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala: ## @@ -40,6 +40,7 @@ import org.apache.spark.sql.execution.{ColumnarRule, SparkPlan} *

[GitHub] [spark] viirya commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
viirya commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026174652 ## sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala: ## @@ -105,12 +105,30 @@ class QueryExecution( case other => other } + lazy

[GitHub] [spark] fred-db commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-18 Thread GitBox
fred-db commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1026144771 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -201,15 +204,17 @@ object RewritePredicateSubquery extends

[GitHub] [spark] viirya commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
viirya commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026181826 ## sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala: ## @@ -105,12 +105,30 @@ class QueryExecution( case other => other } + lazy

[GitHub] [spark] fred-db commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-18 Thread GitBox
fred-db commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1026142927 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -201,15 +204,17 @@ object RewritePredicateSubquery extends

[GitHub] [spark] HyukjinKwon closed pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

2022-11-18 Thread GitBox
HyukjinKwon closed pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack URL: https://github.com/apache/spark/pull/38700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

2022-11-18 Thread GitBox
HyukjinKwon commented on PR #38700: URL: https://github.com/apache/spark/pull/38700#issuecomment-1319719556 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gaoyajun02 commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-18 Thread GitBox
gaoyajun02 commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1026206654 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,18 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] MaxGekk commented on a diff in pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-18 Thread GitBox
MaxGekk commented on code in PR #38644: URL: https://github.com/apache/spark/pull/38644#discussion_r1026251775 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOnSuite.scala: ## @@ -244,7 +244,7 @@ class CastWithAnsiOnSuite extends

[GitHub] [spark] EnricoMi commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-11-18 Thread GitBox
EnricoMi commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1319839395 > Could we fix the `DeduplicateRelations`? Interesting, that sounds like a better solution. I'll look into it. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] fred-db commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-18 Thread GitBox
fred-db commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1026198667 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala: ## @@ -31,20 +34,35 @@ object EliminateResolvedHint extends

[GitHub] [spark] MaxGekk opened a new pull request, #38712: [WIP][SQL] Parameterized SQL queries

2022-11-18 Thread GitBox
MaxGekk opened a new pull request, #38712: URL: https://github.com/apache/spark/pull/38712 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-18 Thread GitBox
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1026348675 ## sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala: ## @@ -21,24 +21,22 @@ import java.nio.charset.StandardCharsets import

[GitHub] [spark] MaxGekk closed pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-18 Thread GitBox
MaxGekk closed pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE` URL: https://github.com/apache/spark/pull/38644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-18 Thread GitBox
MaxGekk commented on PR #38644: URL: https://github.com/apache/spark/pull/38644#issuecomment-1319797393 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38706: [TEST ONLY] Come back to collect.foreach(send)

2022-11-18 Thread GitBox
HyukjinKwon commented on code in PR #38706: URL: https://github.com/apache/spark/pull/38706#discussion_r1026369143 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -57,13 +55,7 @@ class

[GitHub] [spark] cloud-fan closed pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-18 Thread GitBox
cloud-fan closed pull request #38497: [SPARK-40999] Hint propagation to subqueries URL: https://github.com/apache/spark/pull/38497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026470585 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala: ## @@ -117,4 +117,23 @@ object ExprUtils extends QueryErrorsBase {

[GitHub] [spark] pan3793 commented on a diff in pull request #38622: [SPARK-39601][YARN] AllocationFailure should not be treated as exitCausedByApp when driver is shutting down

2022-11-18 Thread GitBox
pan3793 commented on code in PR #38622: URL: https://github.com/apache/spark/pull/38622#discussion_r1026203428 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala: ## @@ -835,6 +839,8 @@ private[yarn] class YarnAllocator( // now I

[GitHub] [spark] MaxGekk commented on a diff in pull request #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042`

2022-11-18 Thread GitBox
MaxGekk commented on code in PR #38664: URL: https://github.com/apache/spark/pull/38664#discussion_r1026264084 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -146,7 +147,10 @@ object FunctionRegistryBase {

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026264942 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -756,16 +771,16 @@ object LikeSimplification extends Rule[LogicalPlan]

[GitHub] [spark] HyukjinKwon closed pull request #38675: [SPARK-41161][BUILD] Upgrade scala-parser-combinators to 2.1.1

2022-11-18 Thread GitBox
HyukjinKwon closed pull request #38675: [SPARK-41161][BUILD] Upgrade scala-parser-combinators to 2.1.1 URL: https://github.com/apache/spark/pull/38675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #38675: [SPARK-41161][BUILD] Upgrade scala-parser-combinators to 2.1.1

2022-11-18 Thread GitBox
HyukjinKwon commented on PR #38675: URL: https://github.com/apache/spark/pull/38675#issuecomment-1319891935 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-18 Thread GitBox
LuciferYang commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1320030900 cc @mridulm @Ngone51 FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on a diff in pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-18 Thread GitBox
MaxGekk commented on code in PR #38705: URL: https://github.com/apache/spark/pull/38705#discussion_r1026301316 ## sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out: ## @@ -14,7 +29,7 @@ select format_string() struct<> -- !query output

[GitHub] [spark] cloud-fan commented on pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-18 Thread GitBox
cloud-fan commented on PR #38497: URL: https://github.com/apache/spark/pull/38497#issuecomment-1320017014 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] fred-db commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-18 Thread GitBox
fred-db commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1026196114 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala: ## @@ -31,20 +34,35 @@ object EliminateResolvedHint extends

[GitHub] [spark] toujours33 closed pull request #38709: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-18 Thread GitBox
toujours33 closed pull request #38709: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic URL: https://github.com/apache/spark/pull/38709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] toujours33 opened a new pull request, #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-18 Thread GitBox
toujours33 opened a new pull request, #38711: URL: https://github.com/apache/spark/pull/38711 ### What changes were proposed in this pull request? ExecutorAllocationManager only record count for speculative task, `stageAttemptToNumSpeculativeTasks` increment when speculative task submit,

[GitHub] [spark] cloud-fan commented on pull request #38687: [SPARK-41154][SQL] Incorrect relation caching for queries with time travel spec

2022-11-18 Thread GitBox
cloud-fan commented on PR #38687: URL: https://github.com/apache/spark/pull/38687#issuecomment-1320010391 there is another cache in `SessionCatalog.tableRelationCache`, shall we update it as well? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cloud-fan commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
cloud-fan commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026457875 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -192,6 +192,23 @@ class SparkSessionExtensionSuite extends SparkFunSuite

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026470585 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala: ## @@ -117,4 +117,23 @@ object ExprUtils extends QueryErrorsBase {

[GitHub] [spark] MaxGekk commented on pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-18 Thread GitBox
MaxGekk commented on PR #38665: URL: https://github.com/apache/spark/pull/38665#issuecomment-1319784931 > There are still some uses in spark-rapids. I haven't found other uses in other famous repositories ok. Let's leave `TypeCheckFailure` as is. -- This is an automated message

[GitHub] [spark] MaxGekk closed pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-18 Thread GitBox
MaxGekk closed pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure` URL: https://github.com/apache/spark/pull/38665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #38650: [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION`

2022-11-18 Thread GitBox
MaxGekk commented on code in PR #38650: URL: https://github.com/apache/spark/pull/38650#discussion_r1026258331 ## core/src/main/resources/error/error-classes.json: ## @@ -656,6 +656,11 @@ ], "sqlState" : "42000" }, + "INVALID_EMPTY_LOCATION" : { +"message" : [

[GitHub] [spark] MaxGekk closed pull request #38688: [SPARK-41166][SQL][TESTS] Check errorSubClass of DataTypeMismatch in *ExpressionSuites

2022-11-18 Thread GitBox
MaxGekk closed pull request #38688: [SPARK-41166][SQL][TESTS] Check errorSubClass of DataTypeMismatch in *ExpressionSuites URL: https://github.com/apache/spark/pull/38688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk commented on pull request #38688: [SPARK-41166][TESTS] Check errorSubClass of DataTypeMismatch in *ExpressionSuites

2022-11-18 Thread GitBox
MaxGekk commented on PR #38688: URL: https://github.com/apache/spark/pull/38688#issuecomment-1319815373 +1, LGTM. Merging to master. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026330362 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -743,6 +743,21 @@ object LikeSimplification extends Rule[LogicalPlan] {

[GitHub] [spark] LuciferYang commented on a diff in pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-18 Thread GitBox
LuciferYang commented on code in PR #38705: URL: https://github.com/apache/spark/pull/38705#discussion_r1026475352 ## sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out: ## @@ -14,7 +29,7 @@ select format_string() struct<> -- !query output

[GitHub] [spark] LuciferYang commented on a diff in pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-18 Thread GitBox
LuciferYang commented on code in PR #38705: URL: https://github.com/apache/spark/pull/38705#discussion_r1026474420 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -1662,8 +1675,7 @@ case class StringRPad(str: Expression,

[GitHub] [spark] LuciferYang commented on a diff in pull request #38075: [WIP][SPARK-40633][BUILD] Upgrade janino to 3.1.8

2022-11-18 Thread GitBox
LuciferYang commented on code in PR #38075: URL: https://github.com/apache/spark/pull/38075#discussion_r1026497690 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala: ## @@ -1310,7 +1310,7 @@ case class CatalystToExternalMap private(

[GitHub] [spark] srielau commented on a diff in pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-18 Thread GitBox
srielau commented on code in PR #38713: URL: https://github.com/apache/spark/pull/38713#discussion_r1026672022 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -697,12 +697,12 @@ setQuantifier ; relation -: LATERAL?

[GitHub] [spark] EnricoMi commented on a diff in pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-11-18 Thread GitBox
EnricoMi commented on code in PR #38676: URL: https://github.com/apache/spark/pull/38676#discussion_r1026695873 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1938,7 +1940,10 @@ case class LateralJoin(

[GitHub] [spark] otterc commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-18 Thread GitBox
otterc commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1320389162 Looks good to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-18 Thread GitBox
cloud-fan commented on PR #38713: URL: https://github.com/apache/spark/pull/38713#issuecomment-1320161201 cc @viirya @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-18 Thread GitBox
cloud-fan commented on code in PR #38713: URL: https://github.com/apache/spark/pull/38713#discussion_r1026557894 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala: ## @@ -192,4 +193,131 @@ class UnpivotParserSuite extends AnalysisTest

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
ryan-johnson-databricks commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026540525 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala: ## @@ -217,6 +218,22 @@ class SparkSessionExtensions {

[GitHub] [spark] mridulm commented on pull request #38699: [SPARK-41188][CORE][ML] Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes

2022-11-18 Thread GitBox
mridulm commented on PR #38699: URL: https://github.com/apache/spark/pull/38699#issuecomment-1320294062 If we are setting it in `SparkContext`, do we want to get rid of this from other places like `PythonRunner.compute` ? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang commented on a diff in pull request #38075: [WIP][SPARK-40633][BUILD] Upgrade janino to 3.1.8

2022-11-18 Thread GitBox
LuciferYang commented on code in PR #38075: URL: https://github.com/apache/spark/pull/38075#discussion_r1026500589 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala: ## @@ -1310,7 +1310,7 @@ case class CatalystToExternalMap private(

[GitHub] [spark] cloud-fan opened a new pull request, #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-18 Thread GitBox
cloud-fan opened a new pull request, #38713: URL: https://github.com/apache/spark/pull/38713 ### What changes were proposed in this pull request? Today, our SQL parser only supports PIVOT/UNPIVOT at the end of the FROM clause. This is quite limited and it's better to allow

[GitHub] [spark] AmplabJenkins commented on pull request #38707: [SPARK-41176][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1042

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38707: URL: https://github.com/apache/spark/pull/38707#issuecomment-1320346254 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026534660 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/LikeAnyBenchmark.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] AmplabJenkins commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1320213339 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38710: [SPARK-41179][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1092

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38710: URL: https://github.com/apache/spark/pull/38710#issuecomment-1320213406 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] EnricoMi commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-11-18 Thread GitBox
EnricoMi commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1320300231 Problem is that `DeduplicateRelations` is only considering duplicates between left `output` and right `output`, and not duplicates between left `references` and right `output`. I have

[GitHub] [spark] mridulm commented on a diff in pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultColle

2022-11-18 Thread GitBox
mridulm commented on code in PR #38704: URL: https://github.com/apache/spark/pull/38704#discussion_r1026679895 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2251,7 +2251,11 @@ class DatasetLargeResultCollectingSuite extends QueryTest with

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026531799 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala: ## @@ -207,11 +207,17 @@ class LikeSimplificationSuite extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
cloud-fan commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026569958 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala: ## @@ -217,6 +218,22 @@ class SparkSessionExtensions { checkRuleBuilders +=

[GitHub] [spark] antonipp commented on a diff in pull request #38376: [SPARK-40817] [Kubernetes] Do not discard remote user-specified files when launching Spark jobs on Kubernetes

2022-11-18 Thread GitBox
antonipp commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1026638180 ## core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: ## @@ -1609,6 +1609,16 @@ class TestFileSystem extends org.apache.hadoop.fs.LocalFileSystem {

[GitHub] [spark] MaxGekk commented on pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-18 Thread GitBox
MaxGekk commented on PR #38705: URL: https://github.com/apache/spark/pull/38705#issuecomment-1320429350 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] viirya commented on pull request #38716: [SPARK-XXXXX][SS] Use latestCommittedBatchId as currentBatchId when resuming late batch

2022-11-18 Thread GitBox
viirya commented on PR #38716: URL: https://github.com/apache/spark/pull/38716#issuecomment-1320636937 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HeartSaVioR opened a new pull request, #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

2022-11-18 Thread GitBox
HeartSaVioR opened a new pull request, #38717: URL: https://github.com/apache/spark/pull/38717 ### What changes were proposed in this pull request? This PR proposes to fix the broken metrics when the streaming query has CTE, via applying InlineCTE manually against analyzed plan when

[GitHub] [spark] amaliujia commented on pull request #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python

2022-11-18 Thread GitBox
amaliujia commented on PR #38718: URL: https://github.com/apache/spark/pull/38718#issuecomment-1320510474 @zhengruifeng @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] xkrogen commented on pull request #35969: [SPARK-38651][SQL] Add configuration to support writing out empty schemas in supported filebased datasources

2022-11-18 Thread GitBox
xkrogen commented on PR #35969: URL: https://github.com/apache/spark/pull/35969#issuecomment-1320527156 @cloud-fan , any more concerns on this approach based on what @thejdeep shared? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR commented on pull request #38719: [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
HeartSaVioR commented on PR #38719: URL: https://github.com/apache/spark/pull/38719#issuecomment-1320531883 cc. @zsxwing @viirya @xuanyuanking Please take a look. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR commented on pull request #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

2022-11-18 Thread GitBox
HeartSaVioR commented on PR #38717: URL: https://github.com/apache/spark/pull/38717#issuecomment-1320531472 cc. @zsxwing @cloud-fan @viirya Please take a look. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] liuzqt commented on a diff in pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollec

2022-11-18 Thread GitBox
liuzqt commented on code in PR #38704: URL: https://github.com/apache/spark/pull/38704#discussion_r1026966457 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2251,7 +2251,11 @@ class DatasetLargeResultCollectingSuite extends QueryTest with

[GitHub] [spark] hvanhovell closed pull request #38693: [SPARK-41196] [CONNECT] Homogenize the protobuf version across the Spark connect server to use the same major version.

2022-11-18 Thread GitBox
hvanhovell closed pull request #38693: [SPARK-41196] [CONNECT] Homogenize the protobuf version across the Spark connect server to use the same major version. URL: https://github.com/apache/spark/pull/38693 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] tedyu opened a new pull request, #38715: [SPARK-41197] Upgrade Kafka version to 3.3 release

2022-11-18 Thread GitBox
tedyu opened a new pull request, #38715: URL: https://github.com/apache/spark/pull/38715 ### What changes were proposed in this pull request? This PR upgrades Kafka to 3.3.0 release. ### Why are the changes needed? Kafka 3.3.0 release has new features along with bug fixes:

[GitHub] [spark] MaxGekk closed pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-18 Thread GitBox
MaxGekk closed pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions URL: https://github.com/apache/spark/pull/38705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HeartSaVioR opened a new pull request, #38719: [SPARK-41999][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
HeartSaVioR opened a new pull request, #38719: URL: https://github.com/apache/spark/pull/38719 ### What changes were proposed in this pull request? This PR proposes to fix the metrics issue for streaming query when DSv1 streaming source and DSv2 streaming source are co-used. If the

[GitHub] [spark] tedyu commented on pull request #38715: [SPARK-41197] Upgrade Kafka version to 3.3 release

2022-11-18 Thread GitBox
tedyu commented on PR #38715: URL: https://github.com/apache/spark/pull/38715#issuecomment-1320568475 @HeartSaVioR Can you take a look ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
ryan-johnson-databricks commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026804602 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -192,6 +192,23 @@ class SparkSessionExtensionSuite extends

[GitHub] [spark] geofflangenderfer commented on pull request #4093: [SPARK-5307] SerializationDebugger to help debug NotSerializableException

2022-11-18 Thread GitBox
geofflangenderfer commented on PR #4093: URL: https://github.com/apache/spark/pull/4093#issuecomment-1320457789 could someone give a simple example of how to read the graph? I'm not sure where to start -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] MaxGekk commented on pull request #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-18 Thread GitBox
MaxGekk commented on PR #38696: URL: https://github.com/apache/spark/pull/38696#issuecomment-1320493195 cc @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AmplabJenkins commented on pull request #38703: [SPARK-41191] [SQL] Cache Table is not working while nested caches exist

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38703: URL: https://github.com/apache/spark/pull/38703#issuecomment-1320614138 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38702: [SPARK-41187][Core] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38702: URL: https://github.com/apache/spark/pull/38702#issuecomment-1320614168 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] amaliujia commented on pull request #38693: [SPARK-41196] [CONNECT] Homogenize the protobuf version across the Spark connect server to use the same major version.

2022-11-18 Thread GitBox
amaliujia commented on PR #38693: URL: https://github.com/apache/spark/pull/38693#issuecomment-1320392539 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] ahshahid opened a new pull request, #38714: [WIP][SPARK-41141]. avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-18 Thread GitBox
ahshahid opened a new pull request, #38714: URL: https://github.com/apache/spark/pull/38714 ### What changes were proposed in this pull request? This is a PR for improvement When a subquery references the outer query's aggregate functions, in some cases, it ends up introducing extra

[GitHub] [spark] viirya opened a new pull request, #38716: [SPARK-XXXXX][SS] Use latestCommittedBatchId as currentBatchId when resuming late batch

2022-11-18 Thread GitBox
viirya opened a new pull request, #38716: URL: https://github.com/apache/spark/pull/38716 ### What changes were proposed in this pull request? This patch changes `currentBatchId` when `MicroBatchExecution` tries to resume from late batch from offset log. Previously it

[GitHub] [spark] amaliujia opened a new pull request, #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python

2022-11-18 Thread GitBox
amaliujia opened a new pull request, #38718: URL: https://github.com/apache/spark/pull/38718 ### What changes were proposed in this pull request? Fix out of sync generated files for Python. This happens on a rare case for protobuf version change. There were something

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026987556 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala: ## @@ -117,4 +117,23 @@ object ExprUtils extends QueryErrorsBase {

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1027001153 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala: ## @@ -117,4 +117,29 @@ object ExprUtils extends QueryErrorsBase {

[GitHub] [spark] wangyum commented on pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1320735591 @wankunde Please fix the PR title and description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] WangGuangxin opened a new pull request, #38722: [SPARK-41200][CORE] BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-18 Thread GitBox
WangGuangxin opened a new pull request, #38722: URL: https://github.com/apache/spark/pull/38722 ### What changes were proposed in this pull request? In BytesToBytesMap, the longArray size can be up to `MAX_CAPACITY` instead `MAX_CAPACITY/2` since `MAX_CAPACITY` already take `two array

[GitHub] [spark] viirya commented on a diff in pull request #38719: [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
viirya commented on code in PR #38719: URL: https://github.com/apache/spark/pull/38719#discussion_r1026999005 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -345,7 +345,14 @@ trait ProgressReporter extends Logging { val

[GitHub] [spark] viirya closed pull request #38716: [SPARK-XXXXX][SS] Use latestCommittedBatchId as currentBatchId when resuming late batch

2022-11-18 Thread GitBox
viirya closed pull request #38716: [SPARK-X][SS] Use latestCommittedBatchId as currentBatchId when resuming late batch URL: https://github.com/apache/spark/pull/38716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] AmplabJenkins commented on pull request #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38696: URL: https://github.com/apache/spark/pull/38696#issuecomment-1320808013 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] Yikun closed pull request #38698: [SPARK-41186][INFRA][PS][TESTS] Upgrade infra and replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-18 Thread GitBox
Yikun closed pull request #38698: [SPARK-41186][INFRA][PS][TESTS] Upgrade infra and replace `list_run_infos` with `search_runs` in mlflow doctest URL: https://github.com/apache/spark/pull/38698 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] Yikun commented on pull request #38698: [SPARK-41186][INFRA][PS][TESTS] Upgrade infra and replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-18 Thread GitBox
Yikun commented on PR #38698: URL: https://github.com/apache/spark/pull/38698#issuecomment-1320737666 Merge to master, @HyukjinKwon @harupy Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk closed pull request #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-18 Thread GitBox
MaxGekk closed pull request #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078 URL: https://github.com/apache/spark/pull/38696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] commented on pull request #37460: [WIP][SPARK-40031][SQL] Remove unnecessary TryEval in TryCast

2022-11-18 Thread GitBox
github-actions[bot] commented on PR #37460: URL: https://github.com/apache/spark/pull/37460#issuecomment-1320692056 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37359: [SPARK-25342][CORE][SQL]Support rolling back a result stage and rerunning all result tasks when writing files

2022-11-18 Thread GitBox
github-actions[bot] closed pull request #37359: [SPARK-25342][CORE][SQL]Support rolling back a result stage and rerunning all result tasks when writing files URL: https://github.com/apache/spark/pull/37359 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] github-actions[bot] closed pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-11-18 Thread GitBox
github-actions[bot] closed pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join URL: https://github.com/apache/spark/pull/37129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] github-actions[bot] commented on pull request #36767: [SPARK-39363][K8S] Deprecate k8s memory overhead and make it optional

2022-11-18 Thread GitBox
github-actions[bot] commented on PR #36767: URL: https://github.com/apache/spark/pull/36767#issuecomment-1320692121 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #36695: [SPARK-38474][CORE] Use error class in org.apache.spark.security

2022-11-18 Thread GitBox
github-actions[bot] closed pull request #36695: [SPARK-38474][CORE] Use error class in org.apache.spark.security URL: https://github.com/apache/spark/pull/36695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

  1   2   >