Re: [PR] [SPARK-48030][SQL]SPJ: cache rowOrdering and structType for InternalRowComparableWrapper [spark]

2024-04-29 Thread via GitHub
advancedxy commented on PR #46265: URL: https://github.com/apache/spark/pull/46265#issuecomment-2082705978 > It appears that this PR can enhance performance when there are a large number of partitions. Could you please share the test results from a real DatasourceV2 table, such as Iceberg?

Re: [PR] [SPARK-48030][SQL]SPJ: cache rowOrdering and structType for InternalRowComparableWrapper [spark]

2024-04-29 Thread via GitHub
yabola commented on PR #46265: URL: https://github.com/apache/spark/pull/46265#issuecomment-2082672856 @advancedxy It appears that this PR can enhance performance when there are a large number of partitions. Could you please share the test results from a real DatasourceV2 table?

Re: [PR] [SPARK-48030][SQL]SPJ: cache rowOrdering and structType for InternalRowComparableWrapper [spark]

2024-04-29 Thread via GitHub
advancedxy commented on PR #46265: URL: https://github.com/apache/spark/pull/46265#issuecomment-2082648145 > In addition, is there a better expireAfterAccess configuration for NonFateSharingCache? hmmm, maybe. However, the memory usage of cache should be relatively low. Let's wait

Re: [PR] [SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn [spark]

2024-04-29 Thread via GitHub
grundprinzip commented on PR #46182: URL: https://github.com/apache/spark/pull/46182#issuecomment-2082639091 One thing I'm wondering if it might work out of the box is the ability to specify an ephemeral port for the spark connect service and pick this up during startup. This might

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-04-29 Thread via GitHub
tgravescs commented on code in PR #42352: URL: https://github.com/apache/spark/pull/42352#discussion_r1583032683 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -340,6 +385,45 @@ private[spark] class ExecutorAllocationManager( } } + /**

Re: [PR] [SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn [spark]

2024-04-29 Thread via GitHub
grundprinzip commented on code in PR #46182: URL: https://github.com/apache/spark/pull/46182#discussion_r1583029459 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -344,35 +357,79 @@ object SparkConnectService

Re: [PR] [SPARK-48041][SQL] Avoid call conf.resolver repeated in TableOutputResolver [spark]

2024-04-29 Thread via GitHub
xuzifu666 commented on PR #46279: URL: https://github.com/apache/spark/pull/46279#issuecomment-2082514838 > Is there any other outstanding issue similar to SPARK-48010 and this one? yes,similar Sences@yaooqinn -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-48041][SQL] Avoid call conf.resolver repeated in TableOutputResolver [spark]

2024-04-29 Thread via GitHub
yaooqinn commented on PR #46279: URL: https://github.com/apache/spark/pull/46279#issuecomment-2082510201 Is there any other outstanding issue similar to SPARK-48010 and this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer [spark]

2024-04-29 Thread via GitHub
yaooqinn closed pull request #46231: [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer URL: https://github.com/apache/spark/pull/46231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer [spark]

2024-04-29 Thread via GitHub
yaooqinn commented on PR #46231: URL: https://github.com/apache/spark/pull/46231#issuecomment-208257 Thank you @stefanbuk-db Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-48044][PYTHON][CONNECT] Cache `DataFrame.isStreaming` [spark]

2024-04-29 Thread via GitHub
zhengruifeng opened a new pull request, #46281: URL: https://github.com/apache/spark/pull/46281 ### What changes were proposed in this pull request? Cache `DataFrame.isStreaming` ### Why are the changes needed? In PS, `DataFrame.isStreaming` is used in the construction of

Re: [PR] [SPARK-48039][PYTHON][CONNECT] Update the error class for `group.apply` [spark]

2024-04-29 Thread via GitHub
zhengruifeng commented on PR #46277: URL: https://github.com/apache/spark/pull/46277#issuecomment-2082446455 thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48039][PYTHON][CONNECT] Update the error class for `group.apply` [spark]

2024-04-29 Thread via GitHub
HyukjinKwon closed pull request #46277: [SPARK-48039][PYTHON][CONNECT] Update the error class for `group.apply` URL: https://github.com/apache/spark/pull/46277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48039][PYTHON][CONNECT] Update the error class for `group.apply` [spark]

2024-04-29 Thread via GitHub
HyukjinKwon commented on PR #46277: URL: https://github.com/apache/spark/pull/46277#issuecomment-2082435935 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48040][CONNECT]Spark connect supports scheduler pool [spark]

2024-04-29 Thread via GitHub
HyukjinKwon commented on code in PR #46278: URL: https://github.com/apache/spark/pull/46278#discussion_r1582873705 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala: ## @@ -177,6 +177,10 @@ private[connect] class

Re: [PR] [SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn [spark]

2024-04-29 Thread via GitHub
grundprinzip commented on code in PR #46182: URL: https://github.com/apache/spark/pull/46182#discussion_r1582855870 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -344,35 +357,79 @@ object SparkConnectService

Re: [PR] [SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn [spark]

2024-04-29 Thread via GitHub
grundprinzip commented on code in PR #46182: URL: https://github.com/apache/spark/pull/46182#discussion_r1582854628 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -344,35 +357,79 @@ object SparkConnectService

Re: [PR] [SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn [spark]

2024-04-29 Thread via GitHub
grundprinzip commented on code in PR #46182: URL: https://github.com/apache/spark/pull/46182#discussion_r1582852496 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectServer.scala: ## @@ -36,21 +32,21 @@ object SparkConnectServer extends

Re: [PR] [SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn [spark]

2024-04-29 Thread via GitHub
grundprinzip commented on code in PR #46182: URL: https://github.com/apache/spark/pull/46182#discussion_r1582848826 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -2302,6 +2302,11 @@ class SparkContext(config: SparkConf) extends Logging { }

Re: [PR] [SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn [spark]

2024-04-29 Thread via GitHub
grundprinzip commented on code in PR #46182: URL: https://github.com/apache/spark/pull/46182#discussion_r1582846934 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectServer.scala: ## @@ -36,21 +32,21 @@ object SparkConnectServer extends

Re: [PR] [SPARK-48040][CONNECT]Spark connect supports scheduler pool [spark]

2024-04-29 Thread via GitHub
xieshuaihu commented on code in PR #46278: URL: https://github.com/apache/spark/pull/46278#discussion_r1582847360 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala: ## @@ -177,6 +177,10 @@ private[connect] class

Re: [PR] [SPARK-48040][CONNECT]Spark connect supports scheduler pool [spark]

2024-04-29 Thread via GitHub
HyukjinKwon commented on code in PR #46278: URL: https://github.com/apache/spark/pull/46278#discussion_r1582843351 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala: ## @@ -177,6 +177,10 @@ private[connect] class

Re: [PR] [SPARK-48040][CONNECT]Spark connect supports scheduler pool [spark]

2024-04-29 Thread via GitHub
xieshuaihu commented on code in PR #46278: URL: https://github.com/apache/spark/pull/46278#discussion_r1582838182 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala: ## @@ -177,6 +177,10 @@ private[connect] class

Re: [PR] [DRAFT] Store collation information in metadata and not in type for SER/DE [spark]

2024-04-29 Thread via GitHub
olaky commented on code in PR #46280: URL: https://github.com/apache/spark/pull/46280#discussion_r1582837435 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -251,13 +262,25 @@ object DataType { messageParameters = Map("invalidType" ->

Re: [PR] [SPARK-48041][SQL] Avoid repeated calls to conf.resolver in TableOutputResolver [spark]

2024-04-29 Thread via GitHub
xuzifu666 commented on PR #46279: URL: https://github.com/apache/spark/pull/46279#issuecomment-2082376106 > Can you fill the PR description please? Done @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48040][CONNECT]Spark connect supports scheduler pool [spark]

2024-04-29 Thread via GitHub
HyukjinKwon commented on code in PR #46278: URL: https://github.com/apache/spark/pull/46278#discussion_r1582831117 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala: ## @@ -177,6 +177,10 @@ private[connect] class

Re: [PR] [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-04-29 Thread via GitHub
cxzl25 commented on code in PR #46273: URL: https://github.com/apache/spark/pull/46273#discussion_r1582827672 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -85,8 +86,10 @@ class AdaptiveQueryExecSuite

[PR] [DRAFT] New delta schema [spark]

2024-04-29 Thread via GitHub
stefankandic opened a new pull request, #46280: URL: https://github.com/apache/spark/pull/46280 ### What changes were proposed in this pull request? Changing serialization and deserialization of collated strings so that the collation information is put in the metadata of each

Re: [PR] [SPARK-48040][CONNECT]Spark connect supports scheduler pool [spark]

2024-04-29 Thread via GitHub
xieshuaihu commented on code in PR #46278: URL: https://github.com/apache/spark/pull/46278#discussion_r1582815755 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala: ## @@ -177,6 +177,10 @@ private[connect] class

Re: [PR] [SPARK-48041][SQL] Avoid repeated calls to conf.resolver in Analyzer [spark]

2024-04-29 Thread via GitHub
HyukjinKwon commented on PR #46279: URL: https://github.com/apache/spark/pull/46279#issuecomment-2082340036 Can you fill the PR description please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48041][SQL] Avoid repeated calls to conf.resolver in Analyzer [spark]

2024-04-29 Thread via GitHub
HyukjinKwon commented on code in PR #46279: URL: https://github.com/apache/spark/pull/46279#discussion_r1582814873 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3816,7 +3816,7 @@ class Analyzer(override val catalogManager:

Re: [PR] [SPARK-48041][SQL] Avoid repeated calls to conf.resolver in Analyzer [spark]

2024-04-29 Thread via GitHub
xuzifu666 commented on PR #46279: URL: https://github.com/apache/spark/pull/46279#issuecomment-2082334931 @dongjoon-hyun PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48040][CONNECT]Spark connect supports scheduler pool [spark]

2024-04-29 Thread via GitHub
HyukjinKwon commented on code in PR #46278: URL: https://github.com/apache/spark/pull/46278#discussion_r1582812044 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala: ## @@ -177,6 +177,10 @@ private[connect] class

[PR] [SPARK-48041][SQL] Avoid repeated calls to conf.resolver in Analyzer [spark]

2024-04-29 Thread via GitHub
xuzifu666 opened a new pull request, #46279: URL: https://github.com/apache/spark/pull/46279 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-04-29 Thread via GitHub
pan3793 commented on code in PR #46184: URL: https://github.com/apache/spark/pull/46184#discussion_r1582805042 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala: ## @@ -73,7 +73,8 @@ private[spark] class

Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-04-29 Thread via GitHub
pan3793 commented on code in PR #46184: URL: https://github.com/apache/spark/pull/46184#discussion_r1582805042 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala: ## @@ -73,7 +73,8 @@ private[spark] class

Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-04-29 Thread via GitHub
pan3793 commented on code in PR #46184: URL: https://github.com/apache/spark/pull/46184#discussion_r1582805042 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala: ## @@ -73,7 +73,8 @@ private[spark] class

[PR] Spark connect supports scheduler pool [spark]

2024-04-29 Thread via GitHub
xieshuaihu opened a new pull request, #46278: URL: https://github.com/apache/spark/pull/46278 ### What changes were proposed in this pull request? This patch makes spark connect supporting set scheduler pool name like vanilla spark. A new field `scheduler_pool` in

Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-04-29 Thread via GitHub
pan3793 commented on code in PR #46184: URL: https://github.com/apache/spark/pull/46184#discussion_r1582767098 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala: ## @@ -87,16 +77,19 @@ private[spark] class

Re: [PR] [SPARK-47567][SQL] Support LOCATE function to work with collated strings [spark]

2024-04-29 Thread via GitHub
cloud-fan closed pull request #45791: [SPARK-47567][SQL] Support LOCATE function to work with collated strings URL: https://github.com/apache/spark/pull/45791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47567][SQL] Support LOCATE function to work with collated strings [spark]

2024-04-29 Thread via GitHub
miland-db commented on PR #45791: URL: https://github.com/apache/spark/pull/45791#issuecomment-2082250288 jenkins merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47567][SQL] Support LOCATE function to work with collated strings [spark]

2024-04-29 Thread via GitHub
cloud-fan commented on PR #45791: URL: https://github.com/apache/spark/pull/45791#issuecomment-2082251045 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-48039][PYTHON][CONNECT] Update the error class for `group.apply` [spark]

2024-04-29 Thread via GitHub
zhengruifeng opened a new pull request, #46277: URL: https://github.com/apache/spark/pull/46277 ### What changes were proposed in this pull request? Update the error class for `group.apply` ### Why are the changes needed?

Re: [PR] [SPARK-48038][K8S] Promote driverServiceName to KubernetesDriverConf [spark]

2024-04-29 Thread via GitHub
pan3793 commented on PR #46276: URL: https://github.com/apache/spark/pull/46276#issuecomment-2082230596 cc @dongjoon-hyun @yaooqinn @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer [spark]

2024-04-29 Thread via GitHub
yaooqinn commented on PR #46231: URL: https://github.com/apache/spark/pull/46231#issuecomment-2082208303 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47939][SQL] Implement a new Analyzer rule to move ParameterizedQuery inside ExplainCommand and DescribeQueryCommand [spark]

2024-04-29 Thread via GitHub
cloud-fan closed pull request #46209: [SPARK-47939][SQL] Implement a new Analyzer rule to move ParameterizedQuery inside ExplainCommand and DescribeQueryCommand URL: https://github.com/apache/spark/pull/46209 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47939][SQL] Implement a new Analyzer rule to move ParameterizedQuery inside ExplainCommand and DescribeQueryCommand [spark]

2024-04-29 Thread via GitHub
cloud-fan commented on PR #46209: URL: https://github.com/apache/spark/pull/46209#issuecomment-2082201660 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47567][SQL] Support LOCATE function to work with collated strings [spark]

2024-04-29 Thread via GitHub
stevomitric commented on PR #45791: URL: https://github.com/apache/spark/pull/45791#issuecomment-2082204032 +1 LGTM, photon checks will be handled in this pr https://github.com/databricks/runtime/pull/91246 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-04-29 Thread via GitHub
pan3793 commented on code in PR #46184: URL: https://github.com/apache/spark/pull/46184#discussion_r1582734681 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala: ## @@ -749,13 +749,52 @@ private[spark] object Config extends Logging {

Re: [PR] [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer [spark]

2024-04-29 Thread via GitHub
yaooqinn commented on code in PR #46231: URL: https://github.com/apache/spark/pull/46231#discussion_r1582720271 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala: ## @@ -92,6 +92,7 @@ private case class MsSqlServerDialect() extends JdbcDialect {

Re: [PR] [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer [spark]

2024-04-29 Thread via GitHub
yaooqinn commented on code in PR #46231: URL: https://github.com/apache/spark/pull/46231#discussion_r1582719516 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala: ## @@ -92,6 +92,7 @@ private case class MsSqlServerDialect() extends JdbcDialect {

Re: [PR] [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer [spark]

2024-04-29 Thread via GitHub
stefanbuk-db commented on code in PR #46231: URL: https://github.com/apache/spark/pull/46231#discussion_r1582716492 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala: ## @@ -92,6 +92,7 @@ private case class MsSqlServerDialect() extends JdbcDialect {

Re: [PR] [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer [spark]

2024-04-29 Thread via GitHub
yaooqinn commented on code in PR #46231: URL: https://github.com/apache/spark/pull/46231#discussion_r1582712347 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala: ## @@ -92,6 +92,7 @@ private case class MsSqlServerDialect() extends JdbcDialect {

Re: [PR] [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer [spark]

2024-04-29 Thread via GitHub
stefanbuk-db commented on PR #46231: URL: https://github.com/apache/spark/pull/46231#issuecomment-2082141133 cc @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-04-29 Thread via GitHub
pan3793 commented on code in PR #46184: URL: https://github.com/apache/spark/pull/46184#discussion_r1582682967 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -94,6 +96,20 @@ private[spark] class KubernetesDriverConf(

Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-04-29 Thread via GitHub
pan3793 commented on code in PR #46184: URL: https://github.com/apache/spark/pull/46184#discussion_r1582682648 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -94,6 +96,20 @@ private[spark] class KubernetesDriverConf(

[PR] [SPARK-48038][K8S] Promote driverServiceName to KubernetesDriverConf [spark]

2024-04-29 Thread via GitHub
pan3793 opened a new pull request, #46276: URL: https://github.com/apache/spark/pull/46276 ### What changes were proposed in this pull request? Promote `driverServiceName` from `DriverServiceFeatureStep` to `KubernetesDriverConf`. ### Why are the changes needed?

Re: [PR] [WIP][SPARK-48028][TESTS] Regenerate benchmark results after turning ANSI on [spark]

2024-04-29 Thread via GitHub
yaooqinn commented on PR #46266: URL: https://github.com/apache/spark/pull/46266#issuecomment-2082073767 Witnessed an extremely slow benchmark https://github.com/apache/spark/pull/45453#issuecomment-2082071977 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-29 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1582652464 ## python/pyspark/sql/datasource_internal.py: ## @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-46840][SQL][TESTS] Add `CollationBenchmark` [spark]

2024-04-29 Thread via GitHub
yaooqinn commented on PR #45453: URL: https://github.com/apache/spark/pull/45453#issuecomment-2082071977 Hey guys, I am currently regenerating the complete benchmark result with 20 jobs running simultaneously. Each job usually takes around 10 to 30 minutes to complete. However, the

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-29 Thread via GitHub
HeartSaVioR commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1582649481 ## python/pyspark/sql/datasource_internal.py: ## @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-48002][PYTHON][SS][TESTS] Adds sleep before event testing after query termination [spark]

2024-04-29 Thread via GitHub
HyukjinKwon closed pull request #46275: [SPARK-48002][PYTHON][SS][TESTS] Adds sleep before event testing after query termination URL: https://github.com/apache/spark/pull/46275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48002][PYTHON][SS][TESTS] Adds sleep before event testing after query termination [spark]

2024-04-29 Thread via GitHub
HyukjinKwon commented on PR #46275: URL: https://github.com/apache/spark/pull/46275#issuecomment-2082058695 Merged to master. I will revert this if that persists. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48027][SQL] InjectRuntimeFilter for multi-level join should check child join type [spark]

2024-04-29 Thread via GitHub
AngersZh commented on code in PR #46263: URL: https://github.com/apache/spark/pull/46263#discussion_r1582618679 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala: ## @@ -120,33 +132,61 @@ object InjectRuntimeFilter extends

Re: [PR] [SPARK-48002][PYTHON][SS][TESTS] Adds sleep before event testing after query termination [spark]

2024-04-29 Thread via GitHub
HyukjinKwon commented on PR #46275: URL: https://github.com/apache/spark/pull/46275#issuecomment-208168 cc @WweiL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-48002][PYTHON][SS][TESTS] Adds sleep before event testing after query termination [spark]

2024-04-29 Thread via GitHub
HyukjinKwon opened a new pull request, #46275: URL: https://github.com/apache/spark/pull/46275 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/46237 that makes to wait 5 secs after the query termination to make sure

Re: [PR] [SPARK-48027][SQL] InjectRuntimeFilter for multi-level join should check child join type [spark]

2024-04-29 Thread via GitHub
cloud-fan commented on code in PR #46263: URL: https://github.com/apache/spark/pull/46263#discussion_r1582597230 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala: ## @@ -120,33 +132,61 @@ object InjectRuntimeFilter extends

Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-04-29 Thread via GitHub
pan3793 commented on PR #46184: URL: https://github.com/apache/spark/pull/46184#issuecomment-2081981982 @dongjoon-hyun sorry for late, I'm a little busy these days, will address comments soon -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-48027][SQL] InjectRuntimeFilter for multi-level join should check child join type [spark]

2024-04-29 Thread via GitHub
AngersZh commented on code in PR #46263: URL: https://github.com/apache/spark/pull/46263#discussion_r1582589876 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala: ## @@ -120,34 +132,49 @@ object InjectRuntimeFilter extends

Re: [PR] [SPARK-48027][SQL] InjectRuntimeFilter for multi-level join should check child join type [spark]

2024-04-29 Thread via GitHub
AngersZh commented on code in PR #46263: URL: https://github.com/apache/spark/pull/46263#discussion_r1582582142 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala: ## @@ -120,34 +132,49 @@ object InjectRuntimeFilter extends

Re: [PR] [MINOR][DOCS] Remove space in the middle of configuration name in Arrow-optimized Python UDF page [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun closed pull request #46274: [MINOR][DOCS] Remove space in the middle of configuration name in Arrow-optimized Python UDF page URL: https://github.com/apache/spark/pull/46274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [MINOR][DOCS] Remove space in the middle of configuration name in Arrow-optimized Python UDF page [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun commented on PR #46274: URL: https://github.com/apache/spark/pull/46274#issuecomment-2081920450 Merged to master~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun closed pull request #46271: [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` URL: https://github.com/apache/spark/pull/46271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun commented on PR #46271: URL: https://github.com/apache/spark/pull/46271#issuecomment-2081891190 I attached the screenshots too. Merged to master~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] [MINOR][DOCS] Remove space in the middle of configuration name in Arrow-optimized Python UDF page [spark]

2024-04-28 Thread via GitHub
HyukjinKwon opened a new pull request, #46274: URL: https://github.com/apache/spark/pull/46274 ### What changes were proposed in this pull request? This PR removes a space in the middle of configuration name in Arrow-optimized Python UDF page. ![Screenshot 2024-04-29 at 1 53

Re: [PR] [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun commented on PR #46271: URL: https://github.com/apache/spark/pull/46271#issuecomment-2081887536 Thank you for helping me revise this doc, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun commented on code in PR #46271: URL: https://github.com/apache/spark/pull/46271#discussion_r1582534717 ## docs/sql-ref-ansi-compliance.md: ## @@ -67,10 +67,8 @@ The following subsections present behaviour changes in arithmetic operations, ty ### Arithmetic

Re: [PR] [SPARK-48029][INFRA] Update the packages name removed in building the spark docker image [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun closed pull request #46258: [SPARK-48029][INFRA] Update the packages name removed in building the spark docker image URL: https://github.com/apache/spark/pull/46258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48029][INFRA] Update the packages name removed in building the spark docker image [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun commented on PR #46258: URL: https://github.com/apache/spark/pull/46258#issuecomment-2081880744 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-04-28 Thread via GitHub
cxzl25 opened a new pull request, #46273: URL: https://github.com/apache/spark/pull/46273 ### What changes were proposed in this pull request? This PR aims to fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data. ### Why are the

Re: [PR] [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark [spark]

2024-04-28 Thread via GitHub
yaooqinn commented on PR #46270: URL: https://github.com/apache/spark/pull/46270#issuecomment-2081840828 Merged to master(4.0.0), 3.5.2 and 3.4.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark [spark]

2024-04-28 Thread via GitHub
yaooqinn closed pull request #46270: [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark URL: https://github.com/apache/spark/pull/46270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun commented on PR #46271: URL: https://github.com/apache/spark/pull/46271#issuecomment-2081838017 Thank you so much, @yaooqinn ! I updated the PR according to your comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark [spark]

2024-04-28 Thread via GitHub
yaooqinn commented on PR #46270: URL: https://github.com/apache/spark/pull/46270#issuecomment-2081825069 Thank you very much @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` [spark]

2024-04-28 Thread via GitHub
yaooqinn commented on PR #46271: URL: https://github.com/apache/spark/pull/46271#issuecomment-2081821772 LGTM, only [content here](https://github.com/apache/spark/pull/46271/files#diff-54eee79bd27cf5ca1288b078a7b0b1b5ae8ae9d8b4ee7fb75f0b0c7cdaef0da8L70-L73) might need further revison

Re: [PR] [SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured logging framework [spark]

2024-04-28 Thread via GitHub
panbingkun commented on PR #46264: URL: https://github.com/apache/spark/pull/46264#issuecomment-2081820429 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured logging framework [spark]

2024-04-28 Thread via GitHub
panbingkun commented on code in PR #46264: URL: https://github.com/apache/spark/pull/46264#discussion_r1582501170 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -124,8 +130,8 @@ object LogKeys { case object DIFF_DELTA extends LogKey case

Re: [PR] [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun commented on PR #46270: URL: https://github.com/apache/spark/pull/46270#issuecomment-2081817478 Feel free to merge and backport wherever you need this, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun commented on PR #46271: URL: https://github.com/apache/spark/pull/46271#issuecomment-2081817738 Could you review this documentation PR, @yaooqinn ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured logging framework [spark]

2024-04-28 Thread via GitHub
panbingkun commented on code in PR #46264: URL: https://github.com/apache/spark/pull/46264#discussion_r1582501464 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -191,20 +203,23 @@ object LogKeys { case object HIVE_OPERATION_TYPE extends LogKey

Re: [PR] [SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured logging framework [spark]

2024-04-28 Thread via GitHub
panbingkun commented on code in PR #46264: URL: https://github.com/apache/spark/pull/46264#discussion_r1582501170 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -124,8 +130,8 @@ object LogKeys { case object DIFF_DELTA extends LogKey case

Re: [PR] [SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured logging framework [spark]

2024-04-28 Thread via GitHub
panbingkun commented on code in PR #46264: URL: https://github.com/apache/spark/pull/46264#discussion_r1582501097 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -111,7 +117,7 @@ object LogKeys { case object DATA_FILE_NUM extends LogKey case

Re: [PR] [SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured logging framework [spark]

2024-04-28 Thread via GitHub
panbingkun commented on code in PR #46264: URL: https://github.com/apache/spark/pull/46264#discussion_r1582500958 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -191,20 +203,23 @@ object LogKeys { case object HIVE_OPERATION_TYPE extends LogKey

Re: [PR] [SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured logging framework [spark]

2024-04-28 Thread via GitHub
panbingkun commented on code in PR #46264: URL: https://github.com/apache/spark/pull/46264#discussion_r1582500836 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -276,31 +299,40 @@ object LogKeys { case object NUM_FILES_REUSED extends LogKey

Re: [PR] [SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured logging framework [spark]

2024-04-28 Thread via GitHub
panbingkun commented on code in PR #46264: URL: https://github.com/apache/spark/pull/46264#discussion_r1582500556 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -349,11 +387,11 @@ object LogKeys { case object RETRY_COUNT extends LogKey case

Re: [PR] [WIP][SPARK-48028][TESTS] Regenerate benchmark results after turning ANSI on [spark]

2024-04-28 Thread via GitHub
yaooqinn commented on PR #46266: URL: https://github.com/apache/spark/pull/46266#issuecomment-2081812844 Pending CI results https://github.com/yaooqinn/spark/actions/runs/8872679083 https://github.com/yaooqinn/spark/actions/runs/8872179637 -- This is an automated message

[PR] [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md` [spark]

2024-04-28 Thread via GitHub
dongjoon-hyun opened a new pull request, #46271: URL: https://github.com/apache/spark/pull/46271 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark [spark]

2024-04-28 Thread via GitHub
yaooqinn commented on code in PR #46270: URL: https://github.com/apache/spark/pull/46270#discussion_r1582497889 ## core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala: ## @@ -123,7 +123,6 @@ object MapStatusesSerDeserBenchmark extends BenchmarkBase { }

Re: [PR] [SPARK-48034][TESTS] NullPointerException in MapStatusesSerDeserBenchmark [spark]

2024-04-28 Thread via GitHub
yaooqinn commented on code in PR #46270: URL: https://github.com/apache/spark/pull/46270#discussion_r1582497889 ## core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala: ## @@ -123,7 +123,6 @@ object MapStatusesSerDeserBenchmark extends BenchmarkBase { }

Re: [PR] [SPARK-47585][SQL] SQL core: Migrate logInfo with variables to structured logging framework [spark]

2024-04-28 Thread via GitHub
panbingkun commented on code in PR #46264: URL: https://github.com/apache/spark/pull/46264#discussion_r1582496007 ## sql/core/src/main/scala/org/apache/spark/sql/execution/r/ArrowRRunner.scala: ## @@ -161,17 +166,14 @@ class ArrowRRunner( val input =

  1   2   3   4   5   6   7   8   9   10   >