[GitHub] [spark] wangyum commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-11-17 Thread GitBox
wangyum commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1319657140 @EnricoMi @cloud-fan Could we fix the `DeduplicateRelations`? It did not generate different expression IDs for all conflicting attributes: ``` === Applying Rule

[GitHub] [spark] panbingkun opened a new pull request, #38710: [SPARK-41179][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1092

2022-11-17 Thread GitBox
panbingkun opened a new pull request, #38710: URL: https://github.com/apache/spark/pull/38710 ### What changes were proposed in this pull request? In the PR, I propose to assign a name to the error class _LEGACY_ERROR_TEMP_1092. ### Why are the changes needed? Proper names of

[GitHub] [spark] toujours33 opened a new pull request, #38709: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-17 Thread GitBox
toujours33 opened a new pull request, #38709: URL: https://github.com/apache/spark/pull/38709 ### What changes were proposed in this pull request? ExecutorAllocationManager only record count for speculative task, `stageAttemptToNumSpeculativeTasks` increment when speculative task submit,

[GitHub] [spark] Yaohua628 commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
Yaohua628 commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319636434 > If it has been persisted before (like a table), then it's totally fine to write non-nullable data to a nullable column. The optimizer may also optimize a column from nullable to

[GitHub] [spark] LuciferYang opened a new pull request, #38708: [SPARK-41194][PROTOBUF][TESTS] Add `log4j2.properties` for testing to `protobuf` module

2022-11-17 Thread GitBox
LuciferYang opened a new pull request, #38708: URL: https://github.com/apache/spark/pull/38708 ### What changes were proposed in this pull request? This pr add a `log4j2.properties` file for testing to `protobuf` module as others. ### Why are the changes needed? Should

[GitHub] [spark] LuciferYang commented on pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2022-11-17 Thread GitBox
LuciferYang commented on PR #37725: URL: https://github.com/apache/spark/pull/37725#issuecomment-1319616814 - SPARK-41173: https://github.com/apache/spark/pull/38705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
cloud-fan commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319609369 If it has been persisted before (like a table), then it's totally fine to write non-nullable data to a nullable. The optimizer may also optimize a column from nullable to non-nullable,

[GitHub] [spark] Yaohua628 commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
Yaohua628 commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319593235 > shall we change `FileSourceMetadataAttribute`? I initially thought we could relax this field for some future cases. But yeah, you are right, it seems like it is always not null

[GitHub] [spark] panbingkun opened a new pull request, #38707: [SPARK-41176][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1042

2022-11-17 Thread GitBox
panbingkun opened a new pull request, #38707: URL: https://github.com/apache/spark/pull/38707 ### What changes were proposed in this pull request? In the PR, I propose to assign a name to the error class _LEGACY_ERROR_TEMP_1042. ### Why are the changes needed? Proper names of

[GitHub] [spark] zhengruifeng opened a new pull request, #38706: [TEST ONLY] Come back to collect.foreach(send)

2022-11-17 Thread GitBox
zhengruifeng opened a new pull request, #38706: URL: https://github.com/apache/spark/pull/38706 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] LuciferYang opened a new pull request, #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-17 Thread GitBox
LuciferYang opened a new pull request, #38705: URL: https://github.com/apache/spark/pull/38705 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] cloud-fan commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
cloud-fan commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319573470 shall we change `FileSourceMetadataAttribute`? I think the metadata column (at least for file source) is always not nullable. -- This is an automated message from the Apache Git

[GitHub] [spark] MaxGekk commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-17 Thread GitBox
MaxGekk commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1026031578 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1059,10 +1060,16 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] itholic commented on pull request #38702: SPARK-41187 [Core] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-17 Thread GitBox
itholic commented on PR #38702: URL: https://github.com/apache/spark/pull/38702#issuecomment-1319566712 Can we change the JIRA format in the title such as "[SPARK-41187][CORE] ...". Check the [Spark contribution guide](https://spark.apache.org/contributing.html) also would helpful!

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

2022-11-17 Thread GitBox
HyukjinKwon commented on code in PR #38700: URL: https://github.com/apache/spark/pull/38700#discussion_r1026007445 ## python/pyspark/serializers.py: ## @@ -357,7 +358,7 @@ def dumps(self, obj): return obj -if sys.version_info < (3, 8): +if sys.version_info < (3, 8)

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

2022-11-17 Thread GitBox
HyukjinKwon commented on code in PR #38700: URL: https://github.com/apache/spark/pull/38700#discussion_r1026007386 ## python/pyspark/serializers.py: ## @@ -54,6 +54,7 @@ """ import sys +import os Review Comment: eh, it's actually fine in Python import (per PEP 8) --

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
HyukjinKwon commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1026007189 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] dongjoon-hyun commented on pull request #38262: [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10

2022-11-17 Thread GitBox
dongjoon-hyun commented on PR #38262: URL: https://github.com/apache/spark/pull/38262#issuecomment-1319552102 The feature release branches like branch-3.3 will, generally, be maintained with bug fix releases for a period of 18 months. We usually have 3 bug fix releases. Since 3.3.1 on

[GitHub] [spark] sadikovi commented on a diff in pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

2022-11-17 Thread GitBox
sadikovi commented on code in PR #38700: URL: https://github.com/apache/spark/pull/38700#discussion_r1025989704 ## python/pyspark/serializers.py: ## @@ -357,7 +358,7 @@ def dumps(self, obj): return obj -if sys.version_info < (3, 8): +if sys.version_info < (3, 8) or

[GitHub] [spark] dongjoon-hyun commented on pull request #38262: [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10

2022-11-17 Thread GitBox
dongjoon-hyun commented on PR #38262: URL: https://github.com/apache/spark/pull/38262#issuecomment-1319545728 Apache Spark has a pre-defined release cadence, @vitas and @bjornjorgensen . - https://spark.apache.org/versioning-policy.html ![Screenshot 2022-11-17 at 8 56 29

[GitHub] [spark] HyukjinKwon closed pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes URL: https://github.com/apache/spark/pull/38638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38638: URL: https://github.com/apache/spark/pull/38638#issuecomment-1319539398 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #38675: [SPARK-41161][BUILD] Upgrade scala-parser-combinators to 2.1.1

2022-11-17 Thread GitBox
LuciferYang commented on PR #38675: URL: https://github.com/apache/spark/pull/38675#issuecomment-1319538214 pass on 2.12 and 2.13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025982465 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] LuciferYang commented on pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingS

2022-11-17 Thread GitBox
LuciferYang commented on PR #38704: URL: https://github.com/apache/spark/pull/38704#issuecomment-1319535893 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang opened a new pull request, #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollecting

2022-11-17 Thread GitBox
LuciferYang opened a new pull request, #38704: URL: https://github.com/apache/spark/pull/38704 ### What changes were proposed in this pull request? This pr ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite` as default due

[GitHub] [spark] Yaohua628 commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
Yaohua628 commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319529815 > Maybe simpler to apply KnownNullable / KnownNotNull against CreateStruct to enforce desired nullability? Please refer the change in https://github.com/apache/spark/pull/35543.

[GitHub] [spark] mcdull-zhang opened a new pull request, #38703: [SPARK-41191] [SQL] Cache Table is not working while nested caches exist

2022-11-17 Thread GitBox
mcdull-zhang opened a new pull request, #38703: URL: https://github.com/apache/spark/pull/38703 ### What changes were proposed in this pull request? For example the following statement: ```sql cache table t1 as select a from testData3 group by a; cache table t2 as select a,b from

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025960041 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
HyukjinKwon commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025959335 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025953966 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025955547 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] wineternity opened a new pull request, #38702: SPARK-41187 [Core] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-17 Thread GitBox
wineternity opened a new pull request, #38702: URL: https://github.com/apache/spark/pull/38702 ### What changes were proposed in this pull request? Ignore the SparkListenerTaskEnd with Reason "Resubmitted" to avoid memory leak ### Why are the changes needed? For a long

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025954405 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] zhengruifeng opened a new pull request, #38701: [TEST ONLY][DO NOT MERGE] Test collect after avoiding hang with arrow-collect

2022-11-17 Thread GitBox
zhengruifeng opened a new pull request, #38701: URL: https://github.com/apache/spark/pull/38701 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025953966 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] HyukjinKwon opened a new pull request, #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

2022-11-17 Thread GitBox
HyukjinKwon opened a new pull request, #38700: URL: https://github.com/apache/spark/pull/38700 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/34688 that adds a switch to turn on and off the namedtuple hack.

[GitHub] [spark] cloud-fan closed pull request #38691: [SPARK-41178][SQL] Fix parser rule precedence between JOIN and comma

2022-11-17 Thread GitBox
cloud-fan closed pull request #38691: [SPARK-41178][SQL] Fix parser rule precedence between JOIN and comma URL: https://github.com/apache/spark/pull/38691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on pull request #38691: [SPARK-41178][SQL] Fix parser rule precedence between JOIN and comma

2022-11-17 Thread GitBox
cloud-fan commented on PR #38691: URL: https://github.com/apache/spark/pull/38691#issuecomment-1319496802 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] WeichenXu123 opened a new pull request, #38699: [SPARK-41188][CORE][ML] Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes

2022-11-17 Thread GitBox
WeichenXu123 opened a new pull request, #38699: URL: https://github.com/apache/spark/pull/38699 Signed-off-by: Weichen Xu ### What changes were proposed in this pull request? Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM

[GitHub] [spark] HyukjinKwon commented on pull request #38698: [SPARK-41186][PS][TESTS] Replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38698: URL: https://github.com/apache/spark/pull/38698#issuecomment-1319487583 cc @harupy @WeichenXu123 FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on a diff in pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-17 Thread GitBox
itholic commented on code in PR #38644: URL: https://github.com/apache/spark/pull/38644#discussion_r1025941541 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOnSuite.scala: ## @@ -244,7 +244,7 @@ class CastWithAnsiOnSuite extends

[GitHub] [spark] Yikun commented on pull request #38698: [SPARK-41186][PS][TESTS] Replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-17 Thread GitBox
Yikun commented on PR #38698: URL: https://github.com/apache/spark/pull/38698#issuecomment-1319485275 cc @HyukjinKwon @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-17 Thread GitBox
itholic commented on PR #38644: URL: https://github.com/apache/spark/pull/38644#issuecomment-1319485107 cc @MaxGekk @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on a diff in pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-17 Thread GitBox
itholic commented on code in PR #38644: URL: https://github.com/apache/spark/pull/38644#discussion_r1025941541 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOnSuite.scala: ## @@ -244,7 +244,7 @@ class CastWithAnsiOnSuite extends

[GitHub] [spark] Yikun commented on pull request #38698: [SPARK-41186][PS][TESTS] Replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-17 Thread GitBox
Yikun commented on PR #38698: URL: https://github.com/apache/spark/pull/38698#issuecomment-1319483376 First test mlflow 1.30.0 and if result passed, in next commits, I will append the infra full refreshed (mlfow 2.0.1) -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] Yikun opened a new pull request, #38698: [SPARK-41186][PS][TESTS] Replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-17 Thread GitBox
Yikun opened a new pull request, #38698: URL: https://github.com/apache/spark/pull/38698 ### What changes were proposed in this pull request? This patch replace `list_run_infos` with `search_runs` in mlflow doctest. Since mlfow 1.29.0

[GitHub] [spark] panbingkun commented on pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2022-11-17 Thread GitBox
panbingkun commented on PR #37725: URL: https://github.com/apache/spark/pull/37725#issuecomment-1319476144 OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module

2022-11-17 Thread GitBox
LuciferYang commented on PR #38690: URL: https://github.com/apache/spark/pull/38690#issuecomment-1319473917 Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
HeartSaVioR commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319473845 cc. @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HeartSaVioR commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
HeartSaVioR commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319473783 Maybe simpler to apply `KnownNullable` / `KnownNotNull` against `CreateStruct` to enforce desired nullability? Please refer the change in #35543. -- This is an automated message

[GitHub] [spark] HyukjinKwon closed pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module URL: https://github.com/apache/spark/pull/38690 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38690: URL: https://github.com/apache/spark/pull/38690#issuecomment-1319471042 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module

2022-11-17 Thread GitBox
LuciferYang commented on PR #38690: URL: https://github.com/apache/spark/pull/38690#issuecomment-1319467631 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Support user configurable `protoc` and `protoc-gen-grpc-java` executables when building Spark Connect.

2022-11-17 Thread GitBox
LuciferYang commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1319466822 Thansks @HyukjinKwon @grundprinzip @amaliujia ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] bersprockets opened a new pull request, #38697: [SPARK-41118][SQL][3.3] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-17 Thread GitBox
bersprockets opened a new pull request, #38697: URL: https://github.com/apache/spark/pull/38697 Backport of #38635 ### What changes were proposed in this pull request? When a user specifies a null format in `to_number`/`try_to_number`, return `null`, with a data type of

[GitHub] [spark] Yikun commented on pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-17 Thread GitBox
Yikun commented on PR #38611: URL: https://github.com/apache/spark/pull/38611#issuecomment-1319457396 Yes, new version mlflow had some breaking change, you could first install memory-profile in the end of dockerfile(like connect). I will find sometime today to fix the doctest for new

[GitHub] [spark] panbingkun opened a new pull request, #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-17 Thread GitBox
panbingkun opened a new pull request, #38696: URL: https://github.com/apache/spark/pull/38696 ### What changes were proposed in this pull request? In the PR, I propose to assign a name to the error class _LEGACY_ERROR_TEMP_1078. ### Why are the changes needed? Proper names of

[GitHub] [spark] HyukjinKwon commented on pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38611: URL: https://github.com/apache/spark/pull/38611#issuecomment-1319432329 I think mlflow got upgraded together for some reasons. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38611: URL: https://github.com/apache/spark/pull/38611#issuecomment-1319430302 The test failure this time seems different: ```

[GitHub] [spark] xinrong-meng commented on pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-17 Thread GitBox
xinrong-meng commented on PR #38611: URL: https://github.com/apache/spark/pull/38611#issuecomment-1319423102 Do you happen to know if it's normal to fail so many times? @Yikun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Support user configurable `protoc` and `protoc-gen-grpc-java` executables when building Spark Connect.

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1319422162 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38609: [SPARK-40593][BUILD][CONNECT] Support user configurable `protoc` and `protoc-gen-grpc-java` executables when building Spark Connect.

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38609: [SPARK-40593][BUILD][CONNECT] Support user configurable `protoc` and `protoc-gen-grpc-java` executables when building Spark Connect. URL: https://github.com/apache/spark/pull/38609 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] HyukjinKwon closed pull request #38681: [SPARK-41165][CONNECT] Avoid hangs in the arrow collect code path

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38681: [SPARK-41165][CONNECT] Avoid hangs in the arrow collect code path URL: https://github.com/apache/spark/pull/38681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #38681: [SPARK-41165][CONNECT] Avoid hangs in the arrow collect code path

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38681: URL: https://github.com/apache/spark/pull/38681#issuecomment-1319402787 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38694: [SPARK-41184][CONNECT] Disable flakey Fill.NA tests

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38694: [SPARK-41184][CONNECT] Disable flakey Fill.NA tests URL: https://github.com/apache/spark/pull/38694 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #38694: [SPARK-41184][CONNECT] Disable flakey Fill.NA tests

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38694: URL: https://github.com/apache/spark/pull/38694#issuecomment-1319402627 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR closed pull request #38680: [SPARK-40657][PROTOBUF][FOLLOWUP][MINOR] Add clarifying comment in ProtobufUtils

2022-11-17 Thread GitBox
HeartSaVioR closed pull request #38680: [SPARK-40657][PROTOBUF][FOLLOWUP][MINOR] Add clarifying comment in ProtobufUtils URL: https://github.com/apache/spark/pull/38680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HeartSaVioR commented on pull request #38680: [SPARK-40657][PROTOBUF][FOLLOWUP][MINOR] Add clarifying comment in ProtobufUtils

2022-11-17 Thread GitBox
HeartSaVioR commented on PR #38680: URL: https://github.com/apache/spark/pull/38680#issuecomment-1319400995 https://github.com/rangadi/spark/actions/runs/3490416069/jobs/5847475694 Second trial of build is passing for most of jobs and pending k8s integration which I don't believe this PR

[GitHub] [spark] zhengruifeng opened a new pull request, #38695: [TEST ONLY][DO NOT MERGE]. Test the schema of `collect`

2022-11-17 Thread GitBox
zhengruifeng opened a new pull request, #38695: URL: https://github.com/apache/spark/pull/38695 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] github-actions[bot] commented on pull request #36695: [SPARK-38474][CORE] Use error class in org.apache.spark.security

2022-11-17 Thread GitBox
github-actions[bot] commented on PR #36695: URL: https://github.com/apache/spark/pull/36695#issuecomment-1319387650 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-11-17 Thread GitBox
github-actions[bot] commented on PR #37129: URL: https://github.com/apache/spark/pull/37129#issuecomment-1319387624 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37359: [SPARK-25342][CORE][SQL]Support rolling back a result stage and rerunning all result tasks when writing files

2022-11-17 Thread GitBox
github-actions[bot] commented on PR #37359: URL: https://github.com/apache/spark/pull/37359#issuecomment-1319387601 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] amaliujia commented on pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-17 Thread GitBox
amaliujia commented on PR #38638: URL: https://github.com/apache/spark/pull/38638#issuecomment-1319362671 @HyukjinKwon @cloud-fan @grundprinzip @zhengruifeng please take another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-17 Thread GitBox
amaliujia commented on code in PR #38638: URL: https://github.com/apache/spark/pull/38638#discussion_r1025851202 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -38,16 +38,50 @@ message Plan { } } +// Explains the input plan based on a configurable

[GitHub] [spark] bjornjorgensen commented on pull request #38262: [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10

2022-11-17 Thread GitBox
bjornjorgensen commented on PR #38262: URL: https://github.com/apache/spark/pull/38262#issuecomment-1319181705 @vitas I think it is best to ask questions like that on the mailing list https://spark.apache.org/community.html under Mailing lists -- This is an automated message from the

[GitHub] [spark] vitas commented on pull request #38262: [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10

2022-11-17 Thread GitBox
vitas commented on PR #38262: URL: https://github.com/apache/spark/pull/38262#issuecomment-1319168813 when 3.3.2 comes out? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] amaliujia commented on pull request #38693: Homogenize the python proto version

2022-11-17 Thread GitBox
amaliujia commented on PR #38693: URL: https://github.com/apache/spark/pull/38693#issuecomment-1319155192 Also need to make the Scala side version consistent https://github.com/apache/spark/blob/master/connector/connect/pom.xml#L35? -- This is an automated message from the Apache Git

[GitHub] [spark] mridulm commented on pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-17 Thread GitBox
mridulm commented on PR #38567: URL: https://github.com/apache/spark/pull/38567#issuecomment-1319148903 Sorry for the delay, I will try to review this later this week ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] amaliujia commented on pull request #38693: Homogenize the python proto version

2022-11-17 Thread GitBox
amaliujia commented on PR #38693: URL: https://github.com/apache/spark/pull/38693#issuecomment-1319148756 @grundprinzip You need to re-generated the protobuf for Python side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] amaliujia commented on pull request #38694: [SPARK-41184][CONNECT] Disable flakey Fill.NA tests

2022-11-17 Thread GitBox
amaliujia commented on PR #38694: URL: https://github.com/apache/spark/pull/38694#issuecomment-1319136680 LGTM cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell opened a new pull request, #38694: [SPARK-41184][CONNECT] Disable flakey Fill.NA tests

2022-11-17 Thread GitBox
hvanhovell opened a new pull request, #38694: URL: https://github.com/apache/spark/pull/38694 ### What changes were proposed in this pull request? Disable Connect's Pyhon Fill.NA tests because they are flakey. ### Why are the changes needed? Connect's Pyhon Fill.NA tests because

[GitHub] [spark] gengliangwang commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-17 Thread GitBox
gengliangwang commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1025583467 ## core/src/main/scala/org/apache/spark/status/KVUtils.scala: ## @@ -80,6 +89,44 @@ private[spark] object KVUtils extends Logging { db } + def

[GitHub] [spark] amaliujia commented on a diff in pull request #38678: [SPARK-41164][CONNECT] Update relations.proto to follow Connect proto development guide

2022-11-17 Thread GitBox
amaliujia commented on code in PR #38678: URL: https://github.com/apache/spark/pull/38678#discussion_r1025551926 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -106,24 +113,39 @@ message Project { // // For example, `SELECT ABS(-1)` is valid

[GitHub] [spark] MaxGekk closed pull request #38647: [SPARK-41133][SQL] Integrate `UNSCALED_VALUE_TOO_LARGE_FOR_PRECISION` into `NUMERIC_VALUE_OUT_OF_RANGE`

2022-11-17 Thread GitBox
MaxGekk closed pull request #38647: [SPARK-41133][SQL] Integrate `UNSCALED_VALUE_TOO_LARGE_FOR_PRECISION` into `NUMERIC_VALUE_OUT_OF_RANGE` URL: https://github.com/apache/spark/pull/38647 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk commented on pull request #38647: [SPARK-41133][SQL] Integrate `UNSCALED_VALUE_TOO_LARGE_FOR_PRECISION` into `NUMERIC_VALUE_OUT_OF_RANGE`

2022-11-17 Thread GitBox
MaxGekk commented on PR #38647: URL: https://github.com/apache/spark/pull/38647#issuecomment-1319038961 +1, LGTM. Merging to master. Thank you, @itholic and @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] amaliujia commented on a diff in pull request #38678: [SPARK-41164][CONNECT] Update relations.proto to follow Connect proto development guide

2022-11-17 Thread GitBox
amaliujia commented on code in PR #38678: URL: https://github.com/apache/spark/pull/38678#discussion_r1025552911 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -106,24 +113,39 @@ message Project { // // For example, `SELECT ABS(-1)` is valid

[GitHub] [spark] amaliujia commented on a diff in pull request #38678: [SPARK-41164][CONNECT] Update relations.proto to follow Connect proto development guide

2022-11-17 Thread GitBox
amaliujia commented on code in PR #38678: URL: https://github.com/apache/spark/pull/38678#discussion_r1025551926 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -106,24 +113,39 @@ message Project { // // For example, `SELECT ABS(-1)` is valid

[GitHub] [spark] amaliujia commented on pull request #38681: [SPARK-41165][CONNECT] Avoid hangs in the arrow collect code path

2022-11-17 Thread GitBox
amaliujia commented on PR #38681: URL: https://github.com/apache/spark/pull/38681#issuecomment-1319030069 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] grundprinzip opened a new pull request, #38693: Homogenize the python proto version

2022-11-17 Thread GitBox
grundprinzip opened a new pull request, #38693: URL: https://github.com/apache/spark/pull/38693 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] cloud-fan commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-17 Thread GitBox
cloud-fan commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1025472753 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -201,15 +204,17 @@ object RewritePredicateSubquery extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-17 Thread GitBox
cloud-fan commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1025470097 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala: ## @@ -31,20 +34,35 @@ object EliminateResolvedHint extends

[GitHub] [spark] MaxGekk commented on pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2022-11-17 Thread GitBox
MaxGekk commented on PR #37725: URL: https://github.com/apache/spark/pull/37725#issuecomment-1318946953 @panbingkun @LuciferYang @itholic @cloud-fan @srielau @anchovYu @entong I changed the mechanism of generating SQL golden files in this PR to detect the exceptions that haven't been

[GitHub] [spark] cloud-fan commented on pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-17 Thread GitBox
cloud-fan commented on PR #38692: URL: https://github.com/apache/spark/pull/38692#issuecomment-1318943621 @viirya @wangyum @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan opened a new pull request, #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-17 Thread GitBox
cloud-fan opened a new pull request, #38692: URL: https://github.com/apache/spark/pull/38692 ### What changes were proposed in this pull request? Today, Spark is very conservative and uses the analyzed plan instead of the optimized plan as the cache key. Many cache

[GitHub] [spark] MaxGekk commented on a diff in pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2022-11-17 Thread GitBox
MaxGekk commented on code in PR #37725: URL: https://github.com/apache/spark/pull/37725#discussion_r1025206912 ## sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out: ## @@ -5,7 +5,12 @@ select concat_ws() struct<> -- !query output

[GitHub] [spark] LuciferYang commented on pull request #38610: [SPARK-41106][SQL] Reduce collection conversion when create AttributeMap

2022-11-17 Thread GitBox
LuciferYang commented on PR #38610: URL: https://github.com/apache/spark/pull/38610#issuecomment-1318927149 Thanks @srowen ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] srowen commented on pull request #38610: [SPARK-41106][SQL] Reduce collection conversion when create AttributeMap

2022-11-17 Thread GitBox
srowen commented on PR #38610: URL: https://github.com/apache/spark/pull/38610#issuecomment-1318924610 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #38610: [SPARK-41106][SQL] Reduce collection conversion when create AttributeMap

2022-11-17 Thread GitBox
srowen closed pull request #38610: [SPARK-41106][SQL] Reduce collection conversion when create AttributeMap URL: https://github.com/apache/spark/pull/38610 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on a diff in pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38690: URL: https://github.com/apache/spark/pull/38690#discussion_r1025435300 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala: ## @@ -34,9 +34,10 @@ import

  1   2   >