[GitHub] [spark] attilapiros commented on a diff in pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-25 Thread via GitHub
attilapiros commented on code in PR #39728: URL: https://github.com/apache/spark/pull/39728#discussion_r1087515031 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -1118,7 +1118,7 @@ private[spark] object Utils extends Logging { // This means some invalid

[GitHub] [spark] holdenk commented on pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-25 Thread via GitHub
holdenk commented on PR #39728: URL: https://github.com/apache/spark/pull/39728#issuecomment-1404649043 > Could you reply on @attilapiros 's comment, @holdenk ? Done :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] holdenk commented on a diff in pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-25 Thread via GitHub
holdenk commented on code in PR #39728: URL: https://github.com/apache/spark/pull/39728#discussion_r1087508879 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -1118,7 +1118,7 @@ private[spark] object Utils extends Logging { // This means some invalid

[GitHub] [spark] dongjoon-hyun commented on pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39728: URL: https://github.com/apache/spark/pull/39728#issuecomment-1404644646 Could you reply on @attilapiros 's comment, @holdenk ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39751: [SPARK-42197][CONNECT] Reuses JVM initialization, and separate configuration groups to set in remote local mode

2023-01-25 Thread via GitHub
dongjoon-hyun commented on code in PR #39751: URL: https://github.com/apache/spark/pull/39751#discussion_r1087504155 ## python/pyspark/sql/connect/session.py: ## @@ -456,7 +455,7 @@ def register_udf(self, function: Any, return_type: Union[str, DataType]) -> str:

[GitHub] [spark] xinrong-meng opened a new pull request, #39753: [SPARK-42125][CONNECT][PYTHON] Pandas UDF in Spark Connect

2023-01-25 Thread via GitHub
xinrong-meng opened a new pull request, #39753: URL: https://github.com/apache/spark/pull/39753 ### What changes were proposed in this pull request? Support Pandas UDF in Spark Connect. ### Why are the changes needed? To reach parity with the vanilla PySpark. ###

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39749: [SPARK-42195][INFRA] Add Github action test job for branch-3.4

2023-01-25 Thread via GitHub
dongjoon-hyun commented on code in PR #39749: URL: https://github.com/apache/spark/pull/39749#discussion_r1087502422 ## .github/workflows/build_branch34.yml: ## @@ -0,0 +1,49 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[GitHub] [spark] EnricoMi commented on pull request #39752: [SPARK-42168][SQL][PYTHON][FOLLOW-UP] Test FlatMapCoGroupsInPandas with Window function

2023-01-25 Thread via GitHub
EnricoMi commented on PR #39752: URL: https://github.com/apache/spark/pull/39752#issuecomment-1404638163 @sunchao @HyukjinKwon can we port the tests from #39717 to master, and back port them to branch-3.4 and branch-3.3? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] EnricoMi opened a new pull request, #39752: [SPARK-42168][SQL][PYTHON][FOLLOW-UP] Test FlatMapCoGroupsInPandas with Window function

2023-01-25 Thread via GitHub
EnricoMi opened a new pull request, #39752: URL: https://github.com/apache/spark/pull/39752 ### What changes were proposed in this pull request? This ports tests from #39717 in branch-3.2 to master. ### Why are the changes needed? To make sure this use case is tested. ###

[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-25 Thread via GitHub
otterc commented on code in PR #39725: URL: https://github.com/apache/spark/pull/39725#discussion_r1087470017 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1356,6 +1362,15 @@ private boolean isTooLate(

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39749: [SPARK-42195][INFRA] Add Github action test job for branch-3.4

2023-01-25 Thread via GitHub
HyukjinKwon commented on code in PR #39749: URL: https://github.com/apache/spark/pull/39749#discussion_r1087468057 ## .github/workflows/build_branch34.yml: ## @@ -0,0 +1,49 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[GitHub] [spark] HyukjinKwon commented on pull request #39749: [SPARK-42195][INFRA] Add Github action test job for branch-3.4

2023-01-25 Thread via GitHub
HyukjinKwon commented on PR #39749: URL: https://github.com/apache/spark/pull/39749#issuecomment-1404600171 cc @Yikun FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #39751: [SPARK-42197][CONNECT] Reuses JVM initialization, and separate configuration groups to set in remote local mode

2023-01-25 Thread via GitHub
HyukjinKwon commented on PR #39751: URL: https://github.com/apache/spark/pull/39751#issuecomment-1404598721 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon opened a new pull request, #39751: [SPARK-42197][CONNECT] Reuses JVM initialization, and separate configuration groups to set in remote local mode

2023-01-25 Thread via GitHub
HyukjinKwon opened a new pull request, #39751: URL: https://github.com/apache/spark/pull/39751 ### What changes were proposed in this pull request? This PR proposes to refactor `_start_connect_server` by: 1. Reusing `SparkContext._ensure_initialized 2. Separating the

[GitHub] [spark] dongjoon-hyun commented on pull request #39707: [WIP][SPARK-42161][BUILD] Upgrade Apache Arrow to 11.0.0

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39707: URL: https://github.com/apache/spark/pull/39707#issuecomment-1404591811 Thank you for tracking this and sharing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] ganeshchand opened a new pull request, #39750: [SPARK-42196][SS] Fix typo

2023-01-25 Thread via GitHub
ganeshchand opened a new pull request, #39750: URL: https://github.com/apache/spark/pull/39750 ### What changes were proposed in this pull request? Fixed the typo in code in the API documentation ### Does this PR introduce _any_ user-facing change? No ### How was this

[GitHub] [spark] LuciferYang opened a new pull request, #39749: [SPARK-42195][INFRA] Add Github action test job for branch-3.4

2023-01-25 Thread via GitHub
LuciferYang opened a new pull request, #39749: URL: https://github.com/apache/spark/pull/39749 ### What changes were proposed in this pull request? Add Github action test job for branch-3.4 ### Why are the changes needed? Daily test for branch-3.4 ### Does this PR

[GitHub] [spark] grundprinzip commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-25 Thread via GitHub
grundprinzip commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1087450255 ## python/pyspark/sql/connect/client.py: ## @@ -636,6 +677,139 @@ def _handle_error(self, rpc_error: grpc.RpcError) -> NoReturn: raise

[GitHub] [spark] LuciferYang commented on pull request #39707: [WIP][SPARK-42161][BUILD] Upgrade Apache Arrow to 11.0.0

2023-01-25 Thread via GitHub
LuciferYang commented on PR #39707: URL: https://github.com/apache/spark/pull/39707#issuecomment-1404564702 No, it's still in ASF staging repository, not been published to central repository -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun commented on pull request #39707: [WIP][SPARK-42161][BUILD] Upgrade Apache Arrow to 11.0.0

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39707: URL: https://github.com/apache/spark/pull/39707#issuecomment-1404559716 Is there any update, @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39747: [SPARK-42191][SQL] Support udf 'luhn_check'

2023-01-25 Thread via GitHub
dongjoon-hyun commented on code in PR #39747: URL: https://github.com/apache/spark/pull/39747#discussion_r1087436365 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -3039,3 +3039,113 @@ case class SplitPart (

[GitHub] [spark] HyukjinKwon closed pull request #39743: [SPARK-42187][CONNECT][TESTS] Avoid using RemoteSparkSession.builder.getOrCreate in tests

2023-01-25 Thread via GitHub
HyukjinKwon closed pull request #39743: [SPARK-42187][CONNECT][TESTS] Avoid using RemoteSparkSession.builder.getOrCreate in tests URL: https://github.com/apache/spark/pull/39743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #39743: [SPARK-42187][CONNECT][TESTS] Avoid using RemoteSparkSession.builder.getOrCreate in tests

2023-01-25 Thread via GitHub
HyukjinKwon commented on PR #39743: URL: https://github.com/apache/spark/pull/39743#issuecomment-1404554561 All passed. Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #39743: [SPARK-42187][CONNECT][TESTS] Avoid using RemoteSparkSession.builder.getOrCreate in tests

2023-01-25 Thread via GitHub
HyukjinKwon commented on PR #39743: URL: https://github.com/apache/spark/pull/39743#issuecomment-1404551270 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39748: URL: https://github.com/apache/spark/pull/39748#issuecomment-1404550696 Thank you all. Merged to master/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun closed pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
dongjoon-hyun closed pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master` URL: https://github.com/apache/spark/pull/39748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon closed pull request #39738: [SPARK-42182][CONNECT][TESTS] Make `ReusedConnectTestCase` to take Spark configurations

2023-01-25 Thread via GitHub
HyukjinKwon closed pull request #39738: [SPARK-42182][CONNECT][TESTS] Make `ReusedConnectTestCase` to take Spark configurations URL: https://github.com/apache/spark/pull/39738 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #39738: [SPARK-42182][CONNECT][TESTS] Make `ReusedConnectTestCase` to take Spark configurations

2023-01-25 Thread via GitHub
HyukjinKwon commented on PR #39738: URL: https://github.com/apache/spark/pull/39738#issuecomment-1404549541 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
HyukjinKwon commented on PR #39748: URL: https://github.com/apache/spark/pull/39748#issuecomment-1404522121 Yeah, I am fine too :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39748: URL: https://github.com/apache/spark/pull/39748#issuecomment-1404520840 Thank you so much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] viirya commented on pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
viirya commented on PR #39748: URL: https://github.com/apache/spark/pull/39748#issuecomment-1404519049 I'm fine with this in 3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39726: URL: https://github.com/apache/spark/pull/39726#issuecomment-1404512768 +1 for backporting decision. Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39748: URL: https://github.com/apache/spark/pull/39748#issuecomment-1404500415 If you don't mind, I want to deliver this to Apache Spark 3.4 as a counter-part of [SPARK-41550 Dynamic Allocation on K8S GA](https://issues.apache.org/jira/browse/SPARK-41550),

[GitHub] [spark] dongjoon-hyun commented on pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39748: URL: https://github.com/apache/spark/pull/39748#issuecomment-1404497928 Thank you, @viirya and @HyukjinKwon . I addressed the comments too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] huaxingao commented on pull request #39745: [SPARK-42188][BUILD][3.2] Force SBT protobuf version to match Maven

2023-01-25 Thread via GitHub
huaxingao commented on PR #39745: URL: https://github.com/apache/spark/pull/39745#issuecomment-1404488350 Merged to 3.2. Thanks @snmvaughan et al. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] huaxingao closed pull request #39745: [SPARK-42188][BUILD][3.2] Force SBT protobuf version to match Maven

2023-01-25 Thread via GitHub
huaxingao closed pull request #39745: [SPARK-42188][BUILD][3.2] Force SBT protobuf version to match Maven URL: https://github.com/apache/spark/pull/39745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] huaxingao commented on pull request #39746: [SPARK-42188][BUILD][3.3] Force SBT protobuf version to match Maven

2023-01-25 Thread via GitHub
huaxingao commented on PR #39746: URL: https://github.com/apache/spark/pull/39746#issuecomment-1404486303 Merged to 3.3. Thanks @snmvaughan et al. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] huaxingao closed pull request #39746: [SPARK-42188][BUILD][3.3] Force SBT protobuf version to match Maven

2023-01-25 Thread via GitHub
huaxingao closed pull request #39746: [SPARK-42188][BUILD][3.3] Force SBT protobuf version to match Maven URL: https://github.com/apache/spark/pull/39746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] viirya commented on pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
viirya commented on PR #39748: URL: https://github.com/apache/spark/pull/39748#issuecomment-1404481913 Yea, it could be a follow up work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
dongjoon-hyun commented on code in PR #39748: URL: https://github.com/apache/spark/pull/39748#discussion_r1087373254 ## resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala: ## @@ -465,10 +465,12 @@

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
dongjoon-hyun commented on code in PR #39748: URL: https://github.com/apache/spark/pull/39748#discussion_r1087372890 ## docs/running-on-kubernetes.md: ## @@ -590,7 +590,8 @@ See the [configuration page](configuration.html) for information on Spark config

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
HyukjinKwon commented on code in PR #39748: URL: https://github.com/apache/spark/pull/39748#discussion_r1087368805 ## docs/running-on-kubernetes.md: ## @@ -590,7 +590,8 @@ See the [configuration page](configuration.html) for information on Spark config

[GitHub] [spark] vinodkc commented on a diff in pull request #39747: [SPARK-42191][SQL] Support udf 'luhn_check'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39747: URL: https://github.com/apache/spark/pull/39747#discussion_r1087366037 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -3039,3 +3039,113 @@ case class SplitPart ( partNum =

[GitHub] [spark] vinodkc commented on a diff in pull request #39747: [SPARK-42191][SQL] Support udf 'luhn_check'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39747: URL: https://github.com/apache/spark/pull/39747#discussion_r1087365898 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -3039,3 +3039,113 @@ case class SplitPart ( partNum =

[GitHub] [spark] viirya commented on a diff in pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
viirya commented on code in PR #39748: URL: https://github.com/apache/spark/pull/39748#discussion_r1087364002 ## resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala: ## @@ -465,10 +465,12 @@ class

[GitHub] [spark] entong commented on a diff in pull request #39711: [SPARK-41931][SQL] Better error message for incomplete complex type definition

2023-01-25 Thread via GitHub
entong commented on code in PR #39711: URL: https://github.com/apache/spark/pull/39711#discussion_r1087363860 ## core/src/main/resources/error/error-classes.json: ## @@ -592,6 +592,29 @@ "Detected an incompatible DataSourceRegister. Please remove the incompatible

[GitHub] [spark] viirya commented on a diff in pull request #39748: [SPARK-42190][K8S] Support `local` mode in `spark.kubernetes.driver.master`

2023-01-25 Thread via GitHub
viirya commented on code in PR #39748: URL: https://github.com/apache/spark/pull/39748#discussion_r1087363084 ## docs/running-on-kubernetes.md: ## @@ -590,7 +590,8 @@ See the [configuration page](configuration.html) for information on Spark config

[GitHub] [spark] HyukjinKwon closed pull request #39717: [SPARK-42168][3.2][SQL][PYTHON] Fix required child distribution of FlatMapCoGroupsInPandas (as in CoGroup)

2023-01-25 Thread via GitHub
HyukjinKwon closed pull request #39717: [SPARK-42168][3.2][SQL][PYTHON] Fix required child distribution of FlatMapCoGroupsInPandas (as in CoGroup) URL: https://github.com/apache/spark/pull/39717 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HyukjinKwon commented on pull request #39717: [SPARK-42168][3.2][SQL][PYTHON] Fix required child distribution of FlatMapCoGroupsInPandas (as in CoGroup)

2023-01-25 Thread via GitHub
HyukjinKwon commented on PR #39717: URL: https://github.com/apache/spark/pull/39717#issuecomment-1404459200 Merged to branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #39748: [SPARK-42190][K8S] Support `local` mode in spark.kubernetes.driver.master

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39748: URL: https://github.com/apache/spark/pull/39748#issuecomment-1404455301 Could you review this PR, please, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39738: [SPARK-42182][CONNECT][TESTS] Make `ReusedConnectTestCase` to take Spark configurations

2023-01-25 Thread via GitHub
HyukjinKwon commented on code in PR #39738: URL: https://github.com/apache/spark/pull/39738#discussion_r1087347594 ## python/pyspark/sql/connect/session.py: ## @@ -493,6 +493,10 @@ def _start_connect_server(master: str) -> None: session =

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087345403 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -236,40 +265,271 @@ case class Mask( } /** - * Returns

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087344417 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -283,7 +543,32 @@ object Mask { transformChar(_,

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087343461 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -283,7 +543,32 @@ object Mask { transformChar(_,

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087343234 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -77,13 +150,12 @@ import org.apache.spark.unsafe.types.UTF8String

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087342979 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -23,9 +23,82 @@ import

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087342780 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -23,9 +23,82 @@ import

[GitHub] [spark] allisonport-db commented on pull request #38302: [SPARK-40834][SQL] Use SparkListenerSQLExecutionEnd to track final SQL status in UI

2023-01-25 Thread via GitHub
allisonport-db commented on PR #38302: URL: https://github.com/apache/spark/pull/38302#issuecomment-1404399696 hi @ulysses-you any chance you can take a look at https://issues.apache.org/jira/browse/SPARK-41735? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39738: [SPARK-42182][CONNECT][TESTS] Make `ReusedConnectTestCase` to take Spark configurations

2023-01-25 Thread via GitHub
HyukjinKwon commented on code in PR #39738: URL: https://github.com/apache/spark/pull/39738#discussion_r1087299315 ## python/pyspark/sql/connect/session.py: ## @@ -493,6 +493,9 @@ def _start_connect_server(master: str) -> None: session =

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39743: [SPARK-42187][CONNECT][TESTS] Avoid using RemoteSparkSession.builder.getOrCreate in tests

2023-01-25 Thread via GitHub
HyukjinKwon commented on code in PR #39743: URL: https://github.com/apache/spark/pull/39743#discussion_r1087283667 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -15,44 +15,31 @@ # limitations under the License. # import unittest -import tempfile from

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39743: [SPARK-42187][CONNECT][TESTS] Avoid using RemoteSparkSession.builder.getOrCreate in tests

2023-01-25 Thread via GitHub
HyukjinKwon commented on code in PR #39743: URL: https://github.com/apache/spark/pull/39743#discussion_r1087283334 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -57,22 +59,21 @@ from pyspark.sql.connect import functions as CF -@unittest.skipIf(not

[GitHub] [spark] rmcyang commented on pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-25 Thread via GitHub
rmcyang commented on PR #39725: URL: https://github.com/apache/spark/pull/39725#issuecomment-1404317258 Thanks @otterc and @mridulm ! Have updated the PR trying to cover below proposal: - For `onComplete` and `onFailure`, add the data not being used from the push to ignored bytes -

[GitHub] [spark] gengliangwang closed pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output

2023-01-25 Thread via GitHub
gengliangwang closed pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output URL: https://github.com/apache/spark/pull/39726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] gengliangwang commented on pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output

2023-01-25 Thread via GitHub
gengliangwang commented on PR #39726: URL: https://github.com/apache/spark/pull/39726#issuecomment-1404315869 @xinrong-meng I think this one is useful and the jira https://issues.apache.org/jira/browse/SPARK-42123 was created before the 3.4 branch cut. I didn't merge it due to issues

[GitHub] [spark] gengliangwang commented on pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output

2023-01-25 Thread via GitHub
gengliangwang commented on PR #39726: URL: https://github.com/apache/spark/pull/39726#issuecomment-1404313383 cc @Yikun do you know why @dtenedor's github action jobs failed? I haven't seen such an error before. https://github.com/dtenedor/spark/runs/10887372029 -- This is an

[GitHub] [spark] attilapiros commented on a diff in pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-25 Thread via GitHub
attilapiros commented on code in PR #39728: URL: https://github.com/apache/spark/pull/39728#discussion_r1087250704 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -1118,7 +1118,7 @@ private[spark] object Utils extends Logging { // This means some invalid

[GitHub] [spark] dongjoon-hyun commented on pull request #39710: [SPARK-42090][3.2] Introduce sasl retry count in RetryingBlockTransferor

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39710: URL: https://github.com/apache/spark/pull/39710#issuecomment-1404306995 cc @kazuyukitanimura since this lands at branch-3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #39704: [MINOR][K8S][DOCS] Add all resource managers in `Scheduling Within an Application` section

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39704: URL: https://github.com/apache/spark/pull/39704#issuecomment-1404306797 cc @kazuyukitanimura since this lands at branch-3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #39703: [SPARK-42157][CORE] `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39703: URL: https://github.com/apache/spark/pull/39703#issuecomment-1404306525 cc @kazuyukitanimura since this lands at branch-3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] holdenk commented on a diff in pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-25 Thread via GitHub
holdenk commented on code in PR #39728: URL: https://github.com/apache/spark/pull/39728#discussion_r1087233459 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -1109,6 +1109,24 @@ private[spark] object Utils extends Logging { } } + /** + * Normalize

[GitHub] [spark] dongjoon-hyun commented on pull request #39745: [SPARK-42188] Force SBT protobuf version to match Maven on branch 3.2

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39745: URL: https://github.com/apache/spark/pull/39745#issuecomment-1404250420 cc @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #39746: [SPARK-42188][BUILD][3.3] Force SBT protobuf version to match Maven

2023-01-25 Thread via GitHub
dongjoon-hyun commented on PR #39746: URL: https://github.com/apache/spark/pull/39746#issuecomment-1404249605 FYI, cc @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] attilapiros commented on a diff in pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-25 Thread via GitHub
attilapiros commented on code in PR #39728: URL: https://github.com/apache/spark/pull/39728#discussion_r1087190921 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -1109,6 +1109,24 @@ private[spark] object Utils extends Logging { } } + /** + *

[GitHub] [spark] zhenlineo commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-25 Thread via GitHub
zhenlineo commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1087121287 ## connector/connect/client/jvm/pom.xml: ## @@ -75,6 +76,13 @@ mockito-core test + Review Comment: You can check out the MiMa SBT impl I

[GitHub] [spark] zhenlineo commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-25 Thread via GitHub
zhenlineo commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1087121287 ## connector/connect/client/jvm/pom.xml: ## @@ -75,6 +76,13 @@ mockito-core test + Review Comment: You can check out the MiMa SBT impl I

[GitHub] [spark] dtenedor commented on pull request #39747: [SPARK-42191][SQL] Support udf 'luhn_check'

2023-01-25 Thread via GitHub
dtenedor commented on PR #39747: URL: https://github.com/apache/spark/pull/39747#issuecomment-1404167805 The general algorithm and test coverage look correct -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dtenedor commented on a diff in pull request #39747: [SPARK-42191][SQL] Support udf 'luhn_check'

2023-01-25 Thread via GitHub
dtenedor commented on code in PR #39747: URL: https://github.com/apache/spark/pull/39747#discussion_r1087116548 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -3039,3 +3039,113 @@ case class SplitPart ( partNum =

[GitHub] [spark] dtenedor commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
dtenedor commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087097719 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -23,9 +23,82 @@ import

[GitHub] [spark] vinodkc commented on pull request #39747: [SPARK-42191][SQL] Support udf 'luhn_check'

2023-01-25 Thread via GitHub
vinodkc commented on PR #39747: URL: https://github.com/apache/spark/pull/39747#issuecomment-1404150949 Hi @dtenedor, @dtenedor @srielau, @HyukjinKwon @cloud-fan , @gengliangwang Could please review this PR? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39748: [SPARK-42190][K8S] Support `local` mode in spark.kubernetes.driver.master

2023-01-25 Thread via GitHub
dongjoon-hyun opened a new pull request, #39748: URL: https://github.com/apache/spark/pull/39748 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] vinodkc commented on pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on PR #39449: URL: https://github.com/apache/spark/pull/39449#issuecomment-1404137983 @dtenedor , Thank you for the code review comments, I applied the suggested code changes. Can you please do another review -- This is an automated message from the Apache Git Service.

[GitHub] [spark] vinodkc opened a new pull request, #39747: [SPARK-40686][SQL] Support udf 'luhn_check'

2023-01-25 Thread via GitHub
vinodkc opened a new pull request, #39747: URL: https://github.com/apache/spark/pull/39747 ### What changes were proposed in this pull request? Support UDF to check if a given number string is a valid Luhn number. It shall return true if the number string is a valid Luhn number,

[GitHub] [spark] linhongliu-db commented on pull request #39711: [SPARK-41931][SQL] Better error message for incomplete complex type definition

2023-01-25 Thread via GitHub
linhongliu-db commented on PR #39711: URL: https://github.com/apache/spark/pull/39711#issuecomment-1404130850 cc @srielau to review the error message -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] sunchao commented on a diff in pull request #39746: [SPARK-42188] Force SBT protobuf version to match Maven on branch 3.3

2023-01-25 Thread via GitHub
sunchao commented on code in PR #39746: URL: https://github.com/apache/spark/pull/39746#discussion_r1087082457 ## project/SparkBuild.scala: ## @@ -703,6 +706,8 @@ object KubernetesIntegrationTests { * Overrides to work around sbt's dependency resolution being different from

[GitHub] [spark] EnricoMi commented on pull request #39744: [SPARK-38591][SQL][FOLLOW-UP] Fix ambiguous references for sorted cogroups

2023-01-25 Thread via GitHub
EnricoMi commented on PR #39744: URL: https://github.com/apache/spark/pull/39744#issuecomment-1404096769 @cloud-fan this is a follow up to #39640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] gengliangwang closed pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-25 Thread via GitHub
gengliangwang closed pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests URL: https://github.com/apache/spark/pull/39732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] gengliangwang commented on pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-25 Thread via GitHub
gengliangwang commented on PR #39732: URL: https://github.com/apache/spark/pull/39732#issuecomment-1404082537 @dongjoon-hyun @LuciferYang Thanks for the review. Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] huaxingao commented on pull request #39746: [SPARK-42188] Force SBT protobuf version to match Maven on branch 3.3

2023-01-25 Thread via GitHub
huaxingao commented on PR #39746: URL: https://github.com/apache/spark/pull/39746#issuecomment-1404070414 The changes look good to me. cc @sunchao @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] mridulm commented on pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-25 Thread via GitHub
mridulm commented on PR #39725: URL: https://github.com/apache/spark/pull/39725#issuecomment-1404062208 Deferred block writes makes it tricker, yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] otterc commented on pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-25 Thread via GitHub
otterc commented on PR #39725: URL: https://github.com/apache/spark/pull/39725#issuecomment-1404061436 If we don't want to include failed writes, then I think your suggestion requires much more refactoring of the current code. I thought it was slightly riskier to make those changes just

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087035696 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -257,19 +271,272 @@ case class Mask( otherChar =

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087034705 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -257,19 +271,272 @@ case class Mask( otherChar =

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087034137 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -257,19 +271,272 @@ case class Mask( otherChar =

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087032903 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -23,9 +23,54 @@ import

[GitHub] [spark] dtenedor commented on pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
dtenedor commented on PR #39449: URL: https://github.com/apache/spark/pull/39449#issuecomment-1404047107 @vinodkc please ping again when you're ready for another review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] mridulm commented on pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-25 Thread via GitHub
mridulm commented on PR #39725: URL: https://github.com/apache/spark/pull/39725#issuecomment-1404045687 Good callout - I do not want to include failed writes, just those which were actually ignored. We can try/finally the onData, and ignore update in case of IOException -- This is an

[GitHub] [spark] otterc commented on pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-25 Thread via GitHub
otterc commented on PR #39725: URL: https://github.com/apache/spark/pull/39725#issuecomment-1404040435 > Sum up the total number of bytes received in `onData` - does not matter if it was written, deferred, etc. > In onComplete/onFailure, add it to ignored bytes if we are deciding not to

[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-25 Thread via GitHub
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1087007438 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -23,9 +23,54 @@ import

[GitHub] [spark] srielau commented on a diff in pull request #39723: [SPARK-41302][SQL] Assign name to _LEGACY_ERROR_TEMP_1185

2023-01-25 Thread via GitHub
srielau commented on code in PR #39723: URL: https://github.com/apache/spark/pull/39723#discussion_r1086993853 ## core/src/main/resources/error/error-classes.json: ## @@ -797,6 +797,11 @@ ], "sqlState" : "42602" }, + "INVALID_IDENTIFIER_HAS_MORE_THAN_2_NAME_PARTS"

[GitHub] [spark] srielau commented on a diff in pull request #39723: [SPARK-41302][SQL] Assign name to _LEGACY_ERROR_TEMP_1185

2023-01-25 Thread via GitHub
srielau commented on code in PR #39723: URL: https://github.com/apache/spark/pull/39723#discussion_r1086989926 ## core/src/main/resources/error/error-classes.json: ## @@ -797,6 +797,11 @@ ], "sqlState" : "42602" }, + "INVALID_IDENTIFIER_HAS_MORE_THAN_2_NAME_PARTS"

  1   2   >