[GitHub] [spark] zhengruifeng commented on a diff in pull request #38578: [SPARK-41064][CONNECT][PYTHON] Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-10 Thread GitBox
zhengruifeng commented on code in PR #38578: URL: https://github.com/apache/spark/pull/38578#discussion_r1018862065 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala: ## @@ -227,6 +227,21 @@ package object dsl { } } +implicit

[GitHub] [spark] grundprinzip commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

2022-11-10 Thread GitBox
grundprinzip commented on code in PR #38535: URL: https://github.com/apache/spark/pull/38535#discussion_r1018891531 ## python/pyspark/sql/connect/client.py: ## @@ -125,13 +126,30 @@ def metadata(self) -> typing.Iterable[typing.Tuple[str, str]]: @property def

[GitHub] [spark] bjornjorgensen commented on pull request #38589: [SPARK-41087][BUILD] Remove duplicate `-Xmx4g` from `dev/make-distribution.sh` and make `build/mvn` use the same JAVA_OPTS

2022-11-10 Thread GitBox
bjornjorgensen commented on PR #38589: URL: https://github.com/apache/spark/pull/38589#issuecomment-1310115370 Tests are failing because of this. https://github.com/apache/spark/commit/c2a8e48e70abfb6bd101c99c5a0f6017151fc85e -- This is an automated message from the Apache Git Service.

[GitHub] [spark] cloud-fan commented on a diff in pull request #38582: [SPARK-41095][SQL] Convert unresolved operators to internal errors

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38582: URL: https://github.com/apache/spark/pull/38582#discussion_r1019088145 ## core/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -68,6 +68,17 @@ class SparkException( } object SparkException { + def internalError(msg:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019092340 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +120,93 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019117026 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] bjornjorgensen commented on pull request #38601: [SPARK-41100][INFRA] Upgrade Ubuntu to latest

2022-11-10 Thread GitBox
bjornjorgensen commented on PR #38601: URL: https://github.com/apache/spark/pull/38601#issuecomment-1310324785 @Yikun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018837877 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -83,7 +83,6 @@ message Response { int64 uncompressed_bytes = 2; Review Comment:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

2022-11-10 Thread GitBox
HyukjinKwon commented on code in PR #38535: URL: https://github.com/apache/spark/pull/38535#discussion_r1018848047 ## python/pyspark/sql/connect/client.py: ## @@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult": class RemoteSparkSession(object):

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018938395 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018995213 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019036963 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,10 +129,91 @@ class

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019048009 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,10 +129,91 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1019077400 ## sql/core/src/test/scala/org/apache/spark/sql/SubqueryHintPropagationSuite.scala: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] cloud-fan commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019090800 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +120,93 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019090389 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +120,93 @@ class

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019058105 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019118121 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,92 @@ private[sql] object ArrowConverters extends Logging

[GitHub] [spark] bjornjorgensen opened a new pull request, #38601: [WIP] Upgrade Ubuntu latest

2022-11-10 Thread GitBox
bjornjorgensen opened a new pull request, #38601: URL: https://github.com/apache/spark/pull/38601 ### What changes were proposed in this pull request? Upgrade ubuntu version on runners in github actions from 20.04 to latest ### Why are the changes needed? ###

[GitHub] [spark] cloud-fan commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019152492 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,92 @@ private[sql] object ArrowConverters extends Logging {

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38166: [SPARK-40713][CONNECT] Improve SET operation support in the proto and the server

2022-11-10 Thread GitBox
HyukjinKwon commented on code in PR #38166: URL: https://github.com/apache/spark/pull/38166#discussion_r1018835131 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -81,6 +81,31 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

2022-11-10 Thread GitBox
HyukjinKwon commented on code in PR #38535: URL: https://github.com/apache/spark/pull/38535#discussion_r1018850304 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -195,7 +195,15 @@ def test_invalid_connection_strings(self): for i in invalid:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

2022-11-10 Thread GitBox
HyukjinKwon commented on code in PR #38535: URL: https://github.com/apache/spark/pull/38535#discussion_r1018850304 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -195,7 +195,15 @@ def test_invalid_connection_strings(self): for i in invalid:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018875460 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -83,7 +83,6 @@ message Response { int64 uncompressed_bytes = 2; Review Comment:

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018969469 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019007963 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38166: [SPARK-40713][CONNECT] Improve SET operation support in the proto and the server

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38166: URL: https://github.com/apache/spark/pull/38166#discussion_r1019060189 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -81,6 +81,31 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] LuciferYang commented on pull request #38599: [SPARK-41063][BUILD] Clean all except files in Git repository before running Mima

2022-11-10 Thread GitBox
LuciferYang commented on PR #38599: URL: https://github.com/apache/spark/pull/38599#issuecomment-1310380670 Some GA Task were killed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #38597: [SPARK-41034][CONNECT][PYTHON][FOLLOW-UP] Fix mypy annotations test

2022-11-10 Thread GitBox
zhengruifeng commented on PR #38597: URL: https://github.com/apache/spark/pull/38597#issuecomment-1309997726 @grundprinzip I dont know, the package versions do not change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018944392 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018951362 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019048009 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,10 +129,91 @@ class

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019048009 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,10 +129,91 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1019079991 ## sql/core/src/test/scala/org/apache/spark/sql/SubqueryHintPropagationSuite.scala: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] LuciferYang commented on pull request #38589: [SPARK-41087][BUILD] Remove duplicate `-Xmx4g` from `dev/make-distribution.sh` and make `build/mvn` use the same JAVA_OPTS

2022-11-10 Thread GitBox
LuciferYang commented on PR #38589: URL: https://github.com/apache/spark/pull/38589#issuecomment-1310243986 rebased, thanks @bjornjorgensen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #38597: [SPARK-41034][CONNECT][PYTHON][FOLLOW-UP] Fix mypy annotations test

2022-11-10 Thread GitBox
zhengruifeng commented on PR #38597: URL: https://github.com/apache/spark/pull/38597#issuecomment-1309998801 python linter in this PR has passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #38597: [SPARK-41034][CONNECT][PYTHON][FOLLOW-UP] Fix mypy annotations test

2022-11-10 Thread GitBox
HyukjinKwon commented on PR #38597: URL: https://github.com/apache/spark/pull/38597#issuecomment-130519 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38597: [SPARK-41034][CONNECT][PYTHON][FOLLOW-UP] Fix mypy annotations test

2022-11-10 Thread GitBox
HyukjinKwon closed pull request #38597: [SPARK-41034][CONNECT][PYTHON][FOLLOW-UP] Fix mypy annotations test URL: https://github.com/apache/spark/pull/38597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon opened a new pull request, #38599: [SPARK-41063][BUILD] Clean all except files in Git repository before running Mima

2022-11-10 Thread GitBox
HyukjinKwon opened a new pull request, #38599: URL: https://github.com/apache/spark/pull/38599 ### What changes were proposed in this pull request? This PR proposes to clean all (except the files in Git repository) before running Mima. ### Why are the changes needed?

[GitHub] [spark] grundprinzip commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

2022-11-10 Thread GitBox
grundprinzip commented on code in PR #38535: URL: https://github.com/apache/spark/pull/38535#discussion_r1018869517 ## python/pyspark/sql/connect/client.py: ## @@ -125,13 +126,30 @@ def metadata(self) -> typing.Iterable[typing.Tuple[str, str]]: @property def

[GitHub] [spark] zhengruifeng closed pull request #38578: [SPARK-41064][CONNECT][PYTHON] Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-10 Thread GitBox
zhengruifeng closed pull request #38578: [SPARK-41064][CONNECT][PYTHON] Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab` URL: https://github.com/apache/spark/pull/38578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhengruifeng commented on pull request #38578: [SPARK-41064][CONNECT][PYTHON] Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-10 Thread GitBox
zhengruifeng commented on PR #38578: URL: https://github.com/apache/spark/pull/38578#issuecomment-1310060863 merged into master, thank you all for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018928333 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,97 @@ private[sql] object ArrowConverters extends Logging

[GitHub] [spark] AmplabJenkins commented on pull request #38574: [SPARK-41060][K8S] Fix generating driver and executor Config Maps

2022-11-10 Thread GitBox
AmplabJenkins commented on PR #38574: URL: https://github.com/apache/spark/pull/38574#issuecomment-1310144467 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019007963 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] itholic opened a new pull request, #38600: [SPARK-41098][SQL] Rename `GROUP_BY_POS_REFERS_AGG_EXPR` to `GROUP_BY_POS_AGGREGATE`

2022-11-10 Thread GitBox
itholic opened a new pull request, #38600: URL: https://github.com/apache/spark/pull/38600 ### What changes were proposed in this pull request? This PR proposes to rename `GROUP_BY_POS_REFERS_AGG_EXPR` to `GROUP_BY_POS_AGGREGATE` ### Why are the changes needed?

[GitHub] [spark] cloud-fan commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019093136 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +120,93 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019093054 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +120,93 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019094064 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +120,93 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019100196 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019100602 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +120,93 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019099531 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] LuciferYang commented on a diff in pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-10 Thread GitBox
LuciferYang commented on code in PR #38569: URL: https://github.com/apache/spark/pull/38569#discussion_r1018831510 ## core/src/main/resources/error/error-classes.json: ## @@ -469,6 +469,11 @@ "Grouping sets size cannot be greater than " ] }, +

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38166: [SPARK-40713][CONNECT] Improve SET operation support in the proto and the server

2022-11-10 Thread GitBox
HyukjinKwon commented on code in PR #38166: URL: https://github.com/apache/spark/pull/38166#discussion_r1018835131 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -81,6 +81,31 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] fred-db commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-10 Thread GitBox
fred-db commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1018854115 ## sql/core/src/test/scala/org/apache/spark/sql/SubqueryHintPropagationSuite.scala: ## @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018939295 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-10 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1310162375 > I am wondering whether the driver needs to pass the merged reduceId to the external shuffle service (but now the driver cannot fully record merged info), or the shuffle service records

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019032161 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019058105 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1019074653 ## sql/core/src/test/scala/org/apache/spark/sql/SubqueryHintPropagationSuite.scala: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] cloud-fan commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019154009 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,97 @@ private[sql] object ArrowConverters extends Logging {

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019170138 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,97 @@ private[sql] object ArrowConverters extends Logging

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

2022-11-10 Thread GitBox
HyukjinKwon commented on code in PR #38535: URL: https://github.com/apache/spark/pull/38535#discussion_r1018846038 ## python/pyspark/sql/connect/client.py: ## @@ -125,13 +126,30 @@ def metadata(self) -> typing.Iterable[typing.Tuple[str, str]]: @property def

[GitHub] [spark] grundprinzip commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

2022-11-10 Thread GitBox
grundprinzip commented on code in PR #38535: URL: https://github.com/apache/spark/pull/38535#discussion_r1018868868 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -195,7 +195,15 @@ def test_invalid_connection_strings(self): for i in invalid:

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1018997361 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019007963 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] pan3793 commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
pan3793 commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019007963 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1019068317 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -148,7 +150,7 @@ object RewritePredicateSubquery extends

[GitHub] [spark] grundprinzip commented on pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
grundprinzip commented on PR #38468: URL: https://github.com/apache/spark/pull/38468#issuecomment-1310307256 I would like to close the discussion on the ordered vs un-ordered result. 1) For simple clients ordered results are what they expect and it follows the precedent of what users

[GitHub] [spark] bozhang2820 opened a new pull request, #38602: [SPARK-41099][CORE] Do not wrap exceptions thrown in SparkHadoopWriter.write

2022-11-10 Thread GitBox
bozhang2820 opened a new pull request, #38602: URL: https://github.com/apache/spark/pull/38602 ### What changes were proposed in this pull request? Exceptions thrown in `SparkHadoopWriter.write` are wrapped with `SparkException("Job aborted."), which provides little extra

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

2022-11-10 Thread GitBox
HyukjinKwon commented on code in PR #38535: URL: https://github.com/apache/spark/pull/38535#discussion_r1018848047 ## python/pyspark/sql/connect/client.py: ## @@ -235,23 +260,30 @@ def fromProto(cls, pb: typing.Any) -> "AnalyzeResult": class RemoteSparkSession(object):

[GitHub] [spark] cloud-fan commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1019064915 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -52,10 +52,12 @@ object RewritePredicateSubquery extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1019082930 ## sql/core/src/test/scala/org/apache/spark/sql/SubqueryHintPropagationSuite.scala: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] cloud-fan commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019104867 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,92 @@ private[sql] object ArrowConverters extends Logging {

[GitHub] [spark] cloud-fan commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-10 Thread GitBox
cloud-fan commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1019151257 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -114,10 +123,97 @@ class

[GitHub] [spark] Yikun commented on pull request #38601: [SPARK-41100][INFRA] Upgrade Ubuntu to latest

2022-11-10 Thread GitBox
Yikun commented on PR #38601: URL: https://github.com/apache/spark/pull/38601#issuecomment-1310345629 I think we should upgrade to `22.04` when github action `ubuntu-latest` point to 22.04, rather than use the `ubuntu-latest` directly, to reduce the potential impacts of OS upgrade breaking

[GitHub] [spark] MaxGekk commented on pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-10 Thread GitBox
MaxGekk commented on PR #38569: URL: https://github.com/apache/spark/pull/38569#issuecomment-1310561698 +1, LGTM. Merging to master. Thank you, @itholic and @srielau @LuciferYang for review. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] pan3793 commented on pull request #38596: [SPARK-41093][BUILD] Remove netty-tcnative-classes from Spark dependencyList

2022-11-10 Thread GitBox
pan3793 commented on PR #38596: URL: https://github.com/apache/spark/pull/38596#issuecomment-1310574000 cc @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-10 Thread GitBox
cloud-fan commented on PR #38511: URL: https://github.com/apache/spark/pull/38511#issuecomment-1310574974 also cc @wangyum @ulysses-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on pull request #38578: [SPARK-41064][CONNECT][PYTHON] Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-10 Thread GitBox
amaliujia commented on PR #38578: URL: https://github.com/apache/spark/pull/38578#issuecomment-1310695917 Thanks. Late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] amaliujia commented on pull request #38586: [SPARK-41077][CONNECT][PYTHON][REFACTORING] Rename `ColumnRef` to `Column` in Python client implementation

2022-11-10 Thread GitBox
amaliujia commented on PR #38586: URL: https://github.com/apache/spark/pull/38586#issuecomment-1310739431 @HyukjinKwon yes the goal will be matching the API shape of the `Column` in Python/Scala (likely Python first if there is API difference between Python and Scala). This PR is

[GitHub] [spark] mridulm commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-10 Thread GitBox
mridulm commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1310753366 This is related quite a lot to https://github.com/apache/spark/pull/37922 by @wankunde That PR seems to be having build issues, and so has not made progress. -- This is an automated

[GitHub] [spark] srowen closed pull request #38593: [SPARK-41089][YARN][SHUFFLE] Relocate Netty native arm64 libs

2022-11-10 Thread GitBox
srowen closed pull request #38593: [SPARK-41089][YARN][SHUFFLE] Relocate Netty native arm64 libs URL: https://github.com/apache/spark/pull/38593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk closed pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-10 Thread GitBox
MaxGekk closed pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE` URL: https://github.com/apache/spark/pull/38569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on a diff in pull request #38546: [SPARK-41036][CONNECT][PYTHON] `columns` API should use `schema` API to avoid data fetching

2022-11-10 Thread GitBox
amaliujia commented on code in PR #38546: URL: https://github.com/apache/spark/pull/38546#discussion_r1019478368 ## python/pyspark/sql/connect/dataframe.py: ## @@ -139,11 +139,9 @@ def columns(self) -> List[str]: if self._plan is None: return []

[GitHub] [spark] mridulm commented on pull request #38602: [SPARK-41099][CORE] Do not wrap exceptions thrown in SparkHadoopWriter.write

2022-11-10 Thread GitBox
mridulm commented on PR #38602: URL: https://github.com/apache/spark/pull/38602#issuecomment-1310755154 It is wrapped in `SparkException` specifically since we handle `SparkException` in various codepaths for task failure handling. -- This is an automated message from the Apache Git

[GitHub] [spark] LuciferYang commented on pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-10 Thread GitBox
LuciferYang commented on PR #38575: URL: https://github.com/apache/spark/pull/38575#issuecomment-1310402593 Is the sparkr UTs failure is related to this one? https://github.com/itholic/spark/actions/runs/3425639144/jobs/5708796073 ``` ══ Failed

[GitHub] [spark] amaliujia commented on pull request #38594: [SPARK-40852][CONNECT][PYTHON][FOLLOWUP] Make `Summary` a separate proto plan

2022-11-10 Thread GitBox
amaliujia commented on PR #38594: URL: https://github.com/apache/spark/pull/38594#issuecomment-1310693233 LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #38602: [SPARK-41099][CORE] Do not wrap exceptions thrown in SparkHadoopWriter.write

2022-11-10 Thread GitBox
LuciferYang commented on PR #38602: URL: https://github.com/apache/spark/pull/38602#issuecomment-1310395116 Can you give a comparison of the exception stack Before and After this pr in the pr description? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] LuciferYang commented on pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-10 Thread GitBox
LuciferYang commented on PR #38575: URL: https://github.com/apache/spark/pull/38575#issuecomment-1310407470 Looks like we need to refactor this case https://github.com/apache/spark/blob/c5d27603f29437f1686cac70727594c19410a273/R/pkg/tests/fulltests/test_sparkSQL.R#L3986-L3998 --

[GitHub] [spark] MaxGekk commented on a diff in pull request #38582: [SPARK-41095][SQL] Convert unresolved operators to internal errors

2022-11-10 Thread GitBox
MaxGekk commented on code in PR #38582: URL: https://github.com/apache/spark/pull/38582#discussion_r1019383970 ## core/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -68,6 +68,17 @@ class SparkException( } object SparkException { + def internalError(msg:

[GitHub] [spark] MaxGekk commented on a diff in pull request #38582: [SPARK-41095][SQL] Convert unresolved operators to internal errors

2022-11-10 Thread GitBox
MaxGekk commented on code in PR #38582: URL: https://github.com/apache/spark/pull/38582#discussion_r1019383970 ## core/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -68,6 +68,17 @@ class SparkException( } object SparkException { + def internalError(msg:

[GitHub] [spark] mridulm commented on pull request #38091: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

2022-11-10 Thread GitBox
mridulm commented on PR #38091: URL: https://github.com/apache/spark/pull/38091#issuecomment-1310742436 The fix for this was merged recently - did you update to latest @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on pull request #38589: [SPARK-41087][BUILD] Remove duplicate `-Xmx4g` from `dev/make-distribution.sh` and make `build/mvn` use the same JAVA_OPTS

2022-11-10 Thread GitBox
LuciferYang commented on PR #38589: URL: https://github.com/apache/spark/pull/38589#issuecomment-1310450309 maven test all passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] maryannxue commented on a diff in pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

2022-11-10 Thread GitBox
maryannxue commented on code in PR #38558: URL: https://github.com/apache/spark/pull/38558#discussion_r1019286861 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -205,6 +205,8 @@ case class AdaptiveSparkPlanExec( def

[GitHub] [spark] srowen commented on pull request #38596: [SPARK-41093][BUILD] Remove netty-tcnative-classes from Spark dependencyList

2022-11-10 Thread GitBox
srowen commented on PR #38596: URL: https://github.com/apache/spark/pull/38596#issuecomment-1310617398 Sounds ok. How far back should this backport? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #38568: [SSPARK-41051][CORE] Optimize ProcfsMetrics file acquisition

2022-11-10 Thread GitBox
AmplabJenkins commented on PR #38568: URL: https://github.com/apache/spark/pull/38568#issuecomment-1310627176 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] srowen commented on pull request #38593: [SPARK-41089][YARN][SHUFFLE] Relocate Netty native arm64 libs

2022-11-10 Thread GitBox
srowen commented on PR #38593: URL: https://github.com/apache/spark/pull/38593#issuecomment-1310419914 Merged to master/3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] fred-db commented on a diff in pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-10 Thread GitBox
fred-db commented on code in PR #38497: URL: https://github.com/apache/spark/pull/38497#discussion_r1019250270 ## sql/core/src/test/scala/org/apache/spark/sql/SubqueryHintPropagationSuite.scala: ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] srielau commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-10 Thread GitBox
srielau commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1019261584 ## core/src/main/resources/error/error-classes.json: ## @@ -1277,6 +1277,11 @@ "A correlated outer name reference within a subquery expression body was not

  1   2   3   4   >