[GitHub] [spark] github-actions[bot] commented on pull request #36695: [SPARK-38474][CORE] Use error class in org.apache.spark.security

2022-11-17 Thread GitBox
github-actions[bot] commented on PR #36695: URL: https://github.com/apache/spark/pull/36695#issuecomment-1319387650 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] HeartSaVioR commented on pull request #38680: [SPARK-40657][PROTOBUF][FOLLOWUP][MINOR] Add clarifying comment in ProtobufUtils

2022-11-17 Thread GitBox
HeartSaVioR commented on PR #38680: URL: https://github.com/apache/spark/pull/38680#issuecomment-1319400995 https://github.com/rangadi/spark/actions/runs/3490416069/jobs/5847475694 Second trial of build is passing for most of jobs and pending k8s integration which I don't believe this PR

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
HyukjinKwon commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025959335 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] LuciferYang commented on pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingS

2022-11-17 Thread GitBox
LuciferYang commented on PR #38704: URL: https://github.com/apache/spark/pull/38704#issuecomment-1319535893 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang opened a new pull request, #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollecting

2022-11-17 Thread GitBox
LuciferYang opened a new pull request, #38704: URL: https://github.com/apache/spark/pull/38704 ### What changes were proposed in this pull request? This pr ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite` as default due

[GitHub] [spark] cloud-fan commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
cloud-fan commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319573470 shall we change `FileSourceMetadataAttribute`? I think the metadata column (at least for file source) is always not nullable. -- This is an automated message from the Apache Git

[GitHub] [spark] LuciferYang opened a new pull request, #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-17 Thread GitBox
LuciferYang opened a new pull request, #38705: URL: https://github.com/apache/spark/pull/38705 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] Yaohua628 commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
Yaohua628 commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319593235 > shall we change `FileSourceMetadataAttribute`? I initially thought we could relax this field for some future cases. But yeah, you are right, it seems like it is always not null

[GitHub] [spark] github-actions[bot] commented on pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-11-17 Thread GitBox
github-actions[bot] commented on PR #37129: URL: https://github.com/apache/spark/pull/37129#issuecomment-1319387624 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37359: [SPARK-25342][CORE][SQL]Support rolling back a result stage and rerunning all result tasks when writing files

2022-11-17 Thread GitBox
github-actions[bot] commented on PR #37359: URL: https://github.com/apache/spark/pull/37359#issuecomment-1319387601 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] HyukjinKwon closed pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module URL: https://github.com/apache/spark/pull/38690 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38690: URL: https://github.com/apache/spark/pull/38690#issuecomment-1319471042 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun commented on pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2022-11-17 Thread GitBox
panbingkun commented on PR #37725: URL: https://github.com/apache/spark/pull/37725#issuecomment-1319476144 OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] Yikun commented on pull request #38698: [SPARK-41186][PS][TESTS] Replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-17 Thread GitBox
Yikun commented on PR #38698: URL: https://github.com/apache/spark/pull/38698#issuecomment-1319483376 First test mlflow 1.30.0 and if result passed, in next commits, I will append the infra full refreshed (mlfow 2.0.1) -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] WeichenXu123 opened a new pull request, #38699: [SPARK-41188][CORE][ML] Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes

2022-11-17 Thread GitBox
WeichenXu123 opened a new pull request, #38699: URL: https://github.com/apache/spark/pull/38699 Signed-off-by: Weichen Xu ### What changes were proposed in this pull request? Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
HyukjinKwon commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1026007189 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] zhengruifeng opened a new pull request, #38706: [TEST ONLY] Come back to collect.foreach(send)

2022-11-17 Thread GitBox
zhengruifeng opened a new pull request, #38706: URL: https://github.com/apache/spark/pull/38706 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] wangyum commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-11-17 Thread GitBox
wangyum commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1319657140 @EnricoMi @cloud-fan Could we fix the `DeduplicateRelations`? It did not generate different expression IDs for all conflicting attributes: ``` === Applying Rule

[GitHub] [spark] HyukjinKwon commented on pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38611: URL: https://github.com/apache/spark/pull/38611#issuecomment-1319432329 I think mlflow got upgraded together for some reasons. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HeartSaVioR commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
HeartSaVioR commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319473845 cc. @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module

2022-11-17 Thread GitBox
LuciferYang commented on PR #38690: URL: https://github.com/apache/spark/pull/38690#issuecomment-1319473917 Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on a diff in pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-17 Thread GitBox
itholic commented on code in PR #38644: URL: https://github.com/apache/spark/pull/38644#discussion_r1025941541 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOnSuite.scala: ## @@ -244,7 +244,7 @@ class CastWithAnsiOnSuite extends

[GitHub] [spark] Yikun commented on pull request #38698: [SPARK-41186][PS][TESTS] Replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-17 Thread GitBox
Yikun commented on PR #38698: URL: https://github.com/apache/spark/pull/38698#issuecomment-1319485275 cc @HyukjinKwon @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-17 Thread GitBox
itholic commented on PR #38644: URL: https://github.com/apache/spark/pull/38644#issuecomment-1319485107 cc @MaxGekk @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #38675: [SPARK-41161][BUILD] Upgrade scala-parser-combinators to 2.1.1

2022-11-17 Thread GitBox
LuciferYang commented on PR #38675: URL: https://github.com/apache/spark/pull/38675#issuecomment-1319538214 pass on 2.12 and 2.13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] sadikovi commented on a diff in pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

2022-11-17 Thread GitBox
sadikovi commented on code in PR #38700: URL: https://github.com/apache/spark/pull/38700#discussion_r1025989704 ## python/pyspark/serializers.py: ## @@ -357,7 +358,7 @@ def dumps(self, obj): return obj -if sys.version_info < (3, 8): +if sys.version_info < (3, 8) or

[GitHub] [spark] toujours33 opened a new pull request, #38709: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-17 Thread GitBox
toujours33 opened a new pull request, #38709: URL: https://github.com/apache/spark/pull/38709 ### What changes were proposed in this pull request? ExecutorAllocationManager only record count for speculative task, `stageAttemptToNumSpeculativeTasks` increment when speculative task submit,

[GitHub] [spark] HyukjinKwon closed pull request #38681: [SPARK-41165][CONNECT] Avoid hangs in the arrow collect code path

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38681: [SPARK-41165][CONNECT] Avoid hangs in the arrow collect code path URL: https://github.com/apache/spark/pull/38681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon closed pull request #38694: [SPARK-41184][CONNECT] Disable flakey Fill.NA tests

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38694: [SPARK-41184][CONNECT] Disable flakey Fill.NA tests URL: https://github.com/apache/spark/pull/38694 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #38681: [SPARK-41165][CONNECT] Avoid hangs in the arrow collect code path

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38681: URL: https://github.com/apache/spark/pull/38681#issuecomment-1319402787 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #38694: [SPARK-41184][CONNECT] Disable flakey Fill.NA tests

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38694: URL: https://github.com/apache/spark/pull/38694#issuecomment-1319402627 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] bersprockets opened a new pull request, #38697: [SPARK-41118][SQL][3.3] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-17 Thread GitBox
bersprockets opened a new pull request, #38697: URL: https://github.com/apache/spark/pull/38697 Backport of #38635 ### What changes were proposed in this pull request? When a user specifies a null format in `to_number`/`try_to_number`, return `null`, with a data type of

[GitHub] [spark] Yikun opened a new pull request, #38698: [SPARK-41186][PS][TESTS] Replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-17 Thread GitBox
Yikun opened a new pull request, #38698: URL: https://github.com/apache/spark/pull/38698 ### What changes were proposed in this pull request? This patch replace `list_run_infos` with `search_runs` in mlflow doctest. Since mlfow 1.29.0

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025982465 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

2022-11-17 Thread GitBox
HyukjinKwon commented on code in PR #38700: URL: https://github.com/apache/spark/pull/38700#discussion_r1026007445 ## python/pyspark/serializers.py: ## @@ -357,7 +358,7 @@ def dumps(self, obj): return obj -if sys.version_info < (3, 8): +if sys.version_info < (3, 8)

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

2022-11-17 Thread GitBox
HyukjinKwon commented on code in PR #38700: URL: https://github.com/apache/spark/pull/38700#discussion_r1026007386 ## python/pyspark/serializers.py: ## @@ -54,6 +54,7 @@ """ import sys +import os Review Comment: eh, it's actually fine in Python import (per PEP 8) --

[GitHub] [spark] HeartSaVioR closed pull request #38680: [SPARK-40657][PROTOBUF][FOLLOWUP][MINOR] Add clarifying comment in ProtobufUtils

2022-11-17 Thread GitBox
HeartSaVioR closed pull request #38680: [SPARK-40657][PROTOBUF][FOLLOWUP][MINOR] Add clarifying comment in ProtobufUtils URL: https://github.com/apache/spark/pull/38680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Support user configurable `protoc` and `protoc-gen-grpc-java` executables when building Spark Connect.

2022-11-17 Thread GitBox
LuciferYang commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1319466822 Thansks @HyukjinKwon @grundprinzip @amaliujia ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
HeartSaVioR commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319473783 Maybe simpler to apply `KnownNullable` / `KnownNotNull` against `CreateStruct` to enforce desired nullability? Please refer the change in #35543. -- This is an automated message

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025954405 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] panbingkun opened a new pull request, #38707: [SPARK-41176][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1042

2022-11-17 Thread GitBox
panbingkun opened a new pull request, #38707: URL: https://github.com/apache/spark/pull/38707 ### What changes were proposed in this pull request? In the PR, I propose to assign a name to the error class _LEGACY_ERROR_TEMP_1042. ### Why are the changes needed? Proper names of

[GitHub] [spark] LuciferYang commented on pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2022-11-17 Thread GitBox
LuciferYang commented on PR #37725: URL: https://github.com/apache/spark/pull/37725#issuecomment-1319616814 - SPARK-41173: https://github.com/apache/spark/pull/38705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] Yikun commented on pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-17 Thread GitBox
Yikun commented on PR #38611: URL: https://github.com/apache/spark/pull/38611#issuecomment-1319457396 Yes, new version mlflow had some breaking change, you could first install memory-profile in the end of dockerfile(like connect). I will find sometime today to fix the doctest for new

[GitHub] [spark] cloud-fan commented on pull request #38691: [SPARK-41178][SQL] Fix parser rule precedence between JOIN and comma

2022-11-17 Thread GitBox
cloud-fan commented on PR #38691: URL: https://github.com/apache/spark/pull/38691#issuecomment-1319496802 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] wineternity opened a new pull request, #38702: SPARK-41187 [Core] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-17 Thread GitBox
wineternity opened a new pull request, #38702: URL: https://github.com/apache/spark/pull/38702 ### What changes were proposed in this pull request? Ignore the SparkListenerTaskEnd with Reason "Resubmitted" to avoid memory leak ### Why are the changes needed? For a long

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025960041 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] itholic commented on pull request #38702: SPARK-41187 [Core] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-17 Thread GitBox
itholic commented on PR #38702: URL: https://github.com/apache/spark/pull/38702#issuecomment-1319566712 Can we change the JIRA format in the title such as "[SPARK-41187][CORE] ...". Check the [Spark contribution guide](https://spark.apache.org/contributing.html) also would helpful!

[GitHub] [spark] LuciferYang opened a new pull request, #38708: [SPARK-41194][PROTOBUF][TESTS] Add `log4j2.properties` for testing to `protobuf` module

2022-11-17 Thread GitBox
LuciferYang opened a new pull request, #38708: URL: https://github.com/apache/spark/pull/38708 ### What changes were proposed in this pull request? This pr add a `log4j2.properties` file for testing to `protobuf` module as others. ### Why are the changes needed? Should

[GitHub] [spark] Yaohua628 commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
Yaohua628 commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319636434 > If it has been persisted before (like a table), then it's totally fine to write non-nullable data to a nullable column. The optimizer may also optimize a column from nullable to

[GitHub] [spark] zhengruifeng opened a new pull request, #38695: [TEST ONLY][DO NOT MERGE]. Test the schema of `collect`

2022-11-17 Thread GitBox
zhengruifeng opened a new pull request, #38695: URL: https://github.com/apache/spark/pull/38695 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] HyukjinKwon commented on pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38611: URL: https://github.com/apache/spark/pull/38611#issuecomment-1319430302 The test failure this time seems different: ```

[GitHub] [spark] HyukjinKwon commented on pull request #38698: [SPARK-41186][PS][TESTS] Replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38698: URL: https://github.com/apache/spark/pull/38698#issuecomment-1319487583 cc @harupy @WeichenXu123 FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon opened a new pull request, #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

2022-11-17 Thread GitBox
HyukjinKwon opened a new pull request, #38700: URL: https://github.com/apache/spark/pull/38700 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/34688 that adds a switch to turn on and off the namedtuple hack.

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025955547 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025953966 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] dongjoon-hyun commented on pull request #38262: [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10

2022-11-17 Thread GitBox
dongjoon-hyun commented on PR #38262: URL: https://github.com/apache/spark/pull/38262#issuecomment-1319552102 The feature release branches like branch-3.3 will, generally, be maintained with bug fix releases for a period of 18 months. We usually have 3 bug fix releases. Since 3.3.1 on

[GitHub] [spark] HyukjinKwon commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Support user configurable `protoc` and `protoc-gen-grpc-java` executables when building Spark Connect.

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1319422162 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38609: [SPARK-40593][BUILD][CONNECT] Support user configurable `protoc` and `protoc-gen-grpc-java` executables when building Spark Connect.

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38609: [SPARK-40593][BUILD][CONNECT] Support user configurable `protoc` and `protoc-gen-grpc-java` executables when building Spark Connect. URL: https://github.com/apache/spark/pull/38609 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] panbingkun opened a new pull request, #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-17 Thread GitBox
panbingkun opened a new pull request, #38696: URL: https://github.com/apache/spark/pull/38696 ### What changes were proposed in this pull request? In the PR, I propose to assign a name to the error class _LEGACY_ERROR_TEMP_1078. ### Why are the changes needed? Proper names of

[GitHub] [spark] LuciferYang commented on pull request #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module

2022-11-17 Thread GitBox
LuciferYang commented on PR #38690: URL: https://github.com/apache/spark/pull/38690#issuecomment-1319467631 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] itholic commented on a diff in pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-17 Thread GitBox
itholic commented on code in PR #38644: URL: https://github.com/apache/spark/pull/38644#discussion_r1025941541 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOnSuite.scala: ## @@ -244,7 +244,7 @@ class CastWithAnsiOnSuite extends

[GitHub] [spark] LuciferYang commented on a diff in pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38064: URL: https://github.com/apache/spark/pull/38064#discussion_r1025953966 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2228,6 +2231,31 @@ class DatasetSuite extends QueryTest } } +class

[GitHub] [spark] zhengruifeng opened a new pull request, #38701: [TEST ONLY][DO NOT MERGE] Test collect after avoiding hang with arrow-collect

2022-11-17 Thread GitBox
zhengruifeng opened a new pull request, #38701: URL: https://github.com/apache/spark/pull/38701 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] mcdull-zhang opened a new pull request, #38703: [SPARK-41191] [SQL] Cache Table is not working while nested caches exist

2022-11-17 Thread GitBox
mcdull-zhang opened a new pull request, #38703: URL: https://github.com/apache/spark/pull/38703 ### What changes were proposed in this pull request? For example the following statement: ```sql cache table t1 as select a from testData3 group by a; cache table t2 as select a,b from

[GitHub] [spark] Yaohua628 commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
Yaohua628 commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319529815 > Maybe simpler to apply KnownNullable / KnownNotNull against CreateStruct to enforce desired nullability? Please refer the change in https://github.com/apache/spark/pull/35543.

[GitHub] [spark] HyukjinKwon closed pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes URL: https://github.com/apache/spark/pull/38638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38638: URL: https://github.com/apache/spark/pull/38638#issuecomment-1319539398 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-17 Thread GitBox
MaxGekk commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1026031578 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1059,10 +1060,16 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] xinrong-meng commented on pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-17 Thread GitBox
xinrong-meng commented on PR #38611: URL: https://github.com/apache/spark/pull/38611#issuecomment-1319423102 Do you happen to know if it's normal to fail so many times? @Yikun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan closed pull request #38691: [SPARK-41178][SQL] Fix parser rule precedence between JOIN and comma

2022-11-17 Thread GitBox
cloud-fan closed pull request #38691: [SPARK-41178][SQL] Fix parser rule precedence between JOIN and comma URL: https://github.com/apache/spark/pull/38691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #38262: [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10

2022-11-17 Thread GitBox
dongjoon-hyun commented on PR #38262: URL: https://github.com/apache/spark/pull/38262#issuecomment-1319545728 Apache Spark has a pre-defined release cadence, @vitas and @bjornjorgensen . - https://spark.apache.org/versioning-policy.html ![Screenshot 2022-11-17 at 8 56 29

[GitHub] [spark] cloud-fan commented on pull request #38683: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-17 Thread GitBox
cloud-fan commented on PR #38683: URL: https://github.com/apache/spark/pull/38683#issuecomment-1319609369 If it has been persisted before (like a table), then it's totally fine to write non-nullable data to a nullable. The optimizer may also optimize a column from nullable to non-nullable,

[GitHub] [spark] panbingkun opened a new pull request, #38710: [SPARK-41179][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1092

2022-11-17 Thread GitBox
panbingkun opened a new pull request, #38710: URL: https://github.com/apache/spark/pull/38710 ### What changes were proposed in this pull request? In the PR, I propose to assign a name to the error class _LEGACY_ERROR_TEMP_1092. ### Why are the changes needed? Proper names of

[GitHub] [spark] dengziming commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-17 Thread GitBox
dengziming commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1024865486 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -213,7 +213,7 @@ message Deduplicate { message LocalRelation { repeated

[GitHub] [spark] ulysses-you commented on pull request #38687: [SPARK-41154][SQL] Incorrect relation caching for queries with time travel spec

2022-11-17 Thread GitBox
ulysses-you commented on PR #38687: URL: https://github.com/apache/spark/pull/38687#issuecomment-1318482662 cc @cloud-fan @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-17 Thread GitBox
itholic commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1025069390 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1059,8 +1059,8 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] EnricoMi commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-11-17 Thread GitBox
EnricoMi commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1318255174 @wangyum @cloud-fan appreciate your suggestion on how to test this bug in `LeftSemiAntiJoinPushDownSuite` (see https://github.com/apache/spark/pull/38676#issuecomment-1317220559). --

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-17 Thread GitBox
HyukjinKwon commented on code in PR #38638: URL: https://github.com/apache/spark/pull/38638#discussion_r1025039998 ## python/pyspark/sql/connect/dataframe.py: ## @@ -667,12 +668,70 @@ def schema(self) -> StructType: else: return self._schema -def

[GitHub] [spark] HyukjinKwon closed pull request #38678: [SPARK-41164][CONNECT] Update relations.proto to follow Connect proto development guide

2022-11-17 Thread GitBox
HyukjinKwon closed pull request #38678: [SPARK-41164][CONNECT] Update relations.proto to follow Connect proto development guide URL: https://github.com/apache/spark/pull/38678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #38678: [SPARK-41164][CONNECT] Update relations.proto to follow Connect proto development guide

2022-11-17 Thread GitBox
HyukjinKwon commented on PR #38678: URL: https://github.com/apache/spark/pull/38678#issuecomment-1318461338 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Support user configurable `protoc` and `protoc-gen-grpc-java` executables when building Spark Connect.

2022-11-17 Thread GitBox
LuciferYang commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1318498330 Both pr title and description have been updated @grundprinzip Thanks ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dengziming commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-17 Thread GitBox
dengziming commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1024864116 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -213,7 +213,7 @@ message Deduplicate { message LocalRelation { repeated

[GitHub] [spark] grundprinzip commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Support user configurable `protoc` and `protoc-gen-grpc-java` executables when building Spark Connect.

2022-11-17 Thread GitBox
grundprinzip commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1318500167 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] panbingkun opened a new pull request, #38688: [WIP][SPARK-41166][TESTS] Check errorSubClass of DataTypeMismatch in *ExpressionSuites

2022-11-17 Thread GitBox
panbingkun opened a new pull request, #38688: URL: https://github.com/apache/spark/pull/38688 ### What changes were proposed in this pull request? The pr aims to check errorSubClass of DataTypeMismatch in *ExpressionSuites ### Why are the changes needed? ### Does

[GitHub] [spark] grundprinzip commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc-gen-grpc

2022-11-17 Thread GitBox
grundprinzip commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1318486206 Since I cannot directly edit the PR description or title, I would kindly ask you to do the following changes: Title: [SPARK-40593][BUILD][CONNECT] Support user configurable

[GitHub] [spark] pan3793 commented on pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-17 Thread GitBox
pan3793 commented on PR #38651: URL: https://github.com/apache/spark/pull/38651#issuecomment-1318542102 @dongjoon-hyun @LuciferYang would you please take a look again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] beliefer opened a new pull request, #38689: [WIP][SPARK-41171][SQL] Push down filter through window when partitionSpec is empty

2022-11-17 Thread GitBox
beliefer opened a new pull request, #38689: URL: https://github.com/apache/spark/pull/38689 ### What changes were proposed in this pull request? Sometimes, the SQL exists filter which condition compares rank-like window functions with number. For example, `SELECT *, ROW_NUMBER()

[GitHub] [spark] zhengruifeng opened a new pull request, #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-17 Thread GitBox
zhengruifeng opened a new pull request, #38686: URL: https://github.com/apache/spark/pull/38686 ### What changes were proposed in this pull request? Implement `DataFrame.drop` with a proto message ### Why are the changes needed? for api coverage ### Does this PR introduce

[GitHub] [spark] ulysses-you opened a new pull request, #38687: [SPARK-41154][SQL] Incorrect relation caching for queries with time travel spec

2022-11-17 Thread GitBox
ulysses-you opened a new pull request, #38687: URL: https://github.com/apache/spark/pull/38687 ### What changes were proposed in this pull request? Add TimeTravelSpec to the key of relation cache in AnalysisContext. ### Why are the changes needed? Correct the

[GitHub] [spark] mridulm commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-17 Thread GitBox
mridulm commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1318238822 I will try to get to this later this week, do let me know if you are still working on it though. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] mridulm commented on pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-11-17 Thread GitBox
mridulm commented on PR #37922: URL: https://github.com/apache/spark/pull/37922#issuecomment-1318239964 I will try to get to this later this week, do let me know if you are still working on it/have pending comments to address. Thanks -- This is an automated message from the Apache Git

[GitHub] [spark] cloud-fan commented on a diff in pull request #38684: [SPARK-41017][SQL][FOLLOWUP] Respect the original Filter operator order

2022-11-17 Thread GitBox
cloud-fan commented on code in PR #38684: URL: https://github.com/apache/spark/pull/38684#discussion_r1024884862 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala: ## @@ -371,7 +371,8 @@ object V2ScanRelationPushDown extends

[GitHub] [spark] cloud-fan opened a new pull request, #38691: [SPARK-41178][SQL] Fix parser rule precedence between JOIN and comma

2022-11-17 Thread GitBox
cloud-fan opened a new pull request, #38691: URL: https://github.com/apache/spark/pull/38691 ### What changes were proposed in this pull request? This PR fixes a long-standing parser bug in Spark: `JOIN` should take precedence over comma when combining relations. For example,

[GitHub] [spark] LuciferYang opened a new pull request, #38690: [SPARK-41177][PROTOBUF][TESTS] Fix maven test failed of `protobuf` module

2022-11-17 Thread GitBox
LuciferYang opened a new pull request, #38690: URL: https://github.com/apache/spark/pull/38690 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

[GitHub] [spark] LuciferYang commented on a diff in pull request #38312: [SPARK-40819][SQL] Timestamp nanos behaviour regression

2022-11-17 Thread GitBox
LuciferYang commented on code in PR #38312: URL: https://github.com/apache/spark/pull/38312#discussion_r1025353875 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala: ## @@ -1040,6 +1040,14 @@ class ParquetSchemaSuite extends

[GitHub] [spark] cloud-fan commented on pull request #38684: [SPARK-41017][SQL][FOLLOWUP] Respect the original Filter operator order

2022-11-17 Thread GitBox
cloud-fan commented on PR #38684: URL: https://github.com/apache/spark/pull/38684#issuecomment-1318837045 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38678: [SPARK-41164][CONNECT] Update relations.proto to follow Connect proto development guide

2022-11-17 Thread GitBox
cloud-fan commented on code in PR #38678: URL: https://github.com/apache/spark/pull/38678#discussion_r1025196133 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -106,24 +113,39 @@ message Project { // // For example, `SELECT ABS(-1)` is valid

[GitHub] [spark] peter-toth commented on pull request #37630: [SPARK-40193][SQL] Merge subquery plans with different filters

2022-11-17 Thread GitBox
peter-toth commented on PR #37630: URL: https://github.com/apache/spark/pull/37630#issuecomment-1318682439 > @peter-toth Could you fix the conflicts again? Sure, done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan commented on pull request #38691: [SPARK-41178][SQL] Fix parser rule precedence between JOIN and comma

2022-11-17 Thread GitBox
cloud-fan commented on PR #38691: URL: https://github.com/apache/spark/pull/38691#issuecomment-1318810977 cc @viirya @dongjoon-hyun @gengliangwang @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan closed pull request #38684: [SPARK-41017][SQL][FOLLOWUP] Respect the original Filter operator order

2022-11-17 Thread GitBox
cloud-fan closed pull request #38684: [SPARK-41017][SQL][FOLLOWUP] Respect the original Filter operator order URL: https://github.com/apache/spark/pull/38684 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

  1   2   >