Re: [PR] [SPARK-47858][PYTHON][FOLLOWUP] Excluding Python magic methods from error context target [spark]

2024-04-26 Thread via GitHub
HyukjinKwon closed pull request #46215: [SPARK-47858][PYTHON][FOLLOWUP] Excluding Python magic methods from error context target URL: https://github.com/apache/spark/pull/46215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1580640494 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -164,7 +178,20 @@ class PythonStreamingSourceRunner(

Re: [PR] Fix mathExpressions that use StringType [spark]

2024-04-26 Thread via GitHub
uros-db commented on PR #46227: URL: https://github.com/apache/spark/pull/46227#issuecomment-2078899048 I think @mihailom-db created it already https://issues.apache.org/jira/browse/SPARK-47408 -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] [SPARK-48007][BUILD] MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11 [spark]

2024-04-26 Thread via GitHub
yaooqinn opened a new pull request, #46244: URL: https://github.com/apache/spark/pull/46244 ### What changes were proposed in this pull request? This PR upgrades mssql.jdbc.version to 12.6.1.jre11, https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc.

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1580604894 ## python/pyspark/sql/datasource_internal.py: ## @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47355][SQL] Use wildcard imports in CollationTypeCasts [spark]

2024-04-26 Thread via GitHub
HyukjinKwon commented on PR #46230: URL: https://github.com/apache/spark/pull/46230#issuecomment-2078891802 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47994][SQL] Fix bug with CASE WHEN column filter push down in SQLServer [spark]

2024-04-26 Thread via GitHub
HyukjinKwon commented on PR #46231: URL: https://github.com/apache/spark/pull/46231#issuecomment-2078890912 cc @beliefer FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Fix mathExpressions that use StringType [spark]

2024-04-26 Thread via GitHub
HyukjinKwon commented on PR #46227: URL: https://github.com/apache/spark/pull/46227#issuecomment-2078892507 Can you create a JIRA, and link it to the PR title please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47355][SQL] Use wildcard imports in CollationTypeCasts [spark]

2024-04-26 Thread via GitHub
HyukjinKwon closed pull request #46230: [SPARK-47355][SQL] Use wildcard imports in CollationTypeCasts URL: https://github.com/apache/spark/pull/46230 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [WIP][SPARK-48003][SQL] Add collation support for hll sketch aggregate [spark]

2024-04-26 Thread via GitHub
uros-db opened a new pull request, #46241: URL: https://github.com/apache/spark/pull/46241 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write [spark]

2024-04-26 Thread via GitHub
yaooqinn commented on PR #46240: URL: https://github.com/apache/spark/pull/46240#issuecomment-2079123777 Thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write [spark]

2024-04-26 Thread via GitHub
yaooqinn closed pull request #46240: [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write URL: https://github.com/apache/spark/pull/46240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1580604894 ## python/pyspark/sql/datasource_internal.py: ## @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1580659454 ## python/pyspark/sql/datasource_internal.py: ## @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47993][PYTHON] Drop Python 3.8 [spark]

2024-04-26 Thread via GitHub
HyukjinKwon commented on PR #46228: URL: https://github.com/apache/spark/pull/46228#issuecomment-2078883440 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47993][PYTHON] Drop Python 3.8 [spark]

2024-04-26 Thread via GitHub
HyukjinKwon closed pull request #46228: [SPARK-47993][PYTHON] Drop Python 3.8 URL: https://github.com/apache/spark/pull/46228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-45225][SQL][FOLLOW-UP] XML: Fix nested XSD file path resolution [spark]

2024-04-26 Thread via GitHub
HyukjinKwon commented on PR #46235: URL: https://github.com/apache/spark/pull/46235#issuecomment-2078885747 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47858][PYTHON][FOLLOWUP] Excluding Python magic methods from error context target [spark]

2024-04-26 Thread via GitHub
HyukjinKwon commented on PR #46215: URL: https://github.com/apache/spark/pull/46215#issuecomment-2078831120 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47927][SQL]: Fix nullability attribute in UDF decoder [spark]

2024-04-26 Thread via GitHub
eejbyfeldt commented on PR #46156: URL: https://github.com/apache/spark/pull/46156#issuecomment-2078895726 @cloud-fan Since you reviewed the original PR, maybe you could have a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [WIP][SPARK-48003][SQL] Add collation support for hll sketch aggregate [spark]

2024-04-26 Thread via GitHub
09306677806 commented on PR #46241: URL: https://github.com/apache/spark/pull/46241#issuecomment-2078978391 bc1q2tk3c4z7zvjxhswrdpqt7xp98ea99h29zu9qyh From: Uros Bojanic ***@***.***> Sent: Friday, April 26, 2024 12:43:57 PM To: apache/spark

[PR] [SPARK-48005][PS][CONNECT][TESTS] Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup` [spark]

2024-04-26 Thread via GitHub
zhengruifeng opened a new pull request, #46242: URL: https://github.com/apache/spark/pull/46242 ### What changes were proposed in this pull request? Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup` ### Why are the changes needed? this test requires

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1580636552 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1580671684 ## python/pyspark/sql/datasource_internal.py: ## @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47939][SQL] Implement a new Analyzer rule to move ParameterizedQuery inside ExplainCommand and DescribeQueryCommand [spark]

2024-04-26 Thread via GitHub
cloud-fan commented on code in PR #46209: URL: https://github.com/apache/spark/pull/46209#discussion_r1580908359 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala: ## @@ -4715,6 +4715,145 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with

[PR] [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write [spark]

2024-04-26 Thread via GitHub
ulysses-you opened a new pull request, #46240: URL: https://github.com/apache/spark/pull/46240 ### What changes were proposed in this pull request? This pr adds a new trait `WriteFilesExecBase` for v1 write, so that the downstream project can inherit `WriteFilesExecBase`

Re: [PR] [SPARK-45225][SQL][FOLLOW-UP] XML: Fix nested XSD file path resolution [spark]

2024-04-26 Thread via GitHub
HyukjinKwon closed pull request #46235: [SPARK-45225][SQL][FOLLOW-UP] XML: Fix nested XSD file path resolution URL: https://github.com/apache/spark/pull/46235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1580659454 ## python/pyspark/sql/datasource_internal.py: ## @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

Re: [PR] [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with boolean comparison in MsSqlServer [spark]

2024-04-26 Thread via GitHub
yaooqinn commented on PR #46236: URL: https://github.com/apache/spark/pull/46236#issuecomment-2078901765 I guess we didn't find the proper way to both fix the syntax issue and retain the ability to pushdown at that time. -- This is an automated message from the Apache Git Service. To

[PR] [SPARK-48006][SQL]add SortOrder for window function which has no orde… [spark]

2024-04-26 Thread via GitHub
guixiaowen opened a new pull request, #46243: URL: https://github.com/apache/spark/pull/46243 ### What changes were proposed in this pull request? I am doing Hive SQL to switch to Spark SQL. In Hive SQL hive> explain select *,row_number() over

Re: [PR] [SPARK-47408][SQL] Fix mathExpressions that use StringType [spark]

2024-04-26 Thread via GitHub
cloud-fan commented on code in PR #46227: URL: https://github.com/apache/spark/pull/46227#discussion_r1580903400 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47409][SQL] Add support for collation for StringTrim type of functions/expressions [spark]

2024-04-26 Thread via GitHub
davidm-db commented on code in PR #46206: URL: https://github.com/apache/spark/pull/46206#discussion_r1580662930 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -306,6 +308,258 @@ public static int execICU(final UTF8String string,

Re: [PR] [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with boolean comparison in MsSqlServer [spark]

2024-04-26 Thread via GitHub
HyukjinKwon commented on PR #46236: URL: https://github.com/apache/spark/pull/46236#issuecomment-2078889259 qq why did we disable it before? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48004][SQL] Add WriteFilesExecBase trait for v1 write [spark]

2024-04-26 Thread via GitHub
ulysses-you commented on PR #46240: URL: https://github.com/apache/spark/pull/46240#issuecomment-2078886099 cc @cloud-fan @yaooqinn thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47989][SQL] MsSQLServer: Fix the scope of spark.sql.legacy.mssqlserver.numericMapping.enabled [spark]

2024-04-26 Thread via GitHub
yaooqinn commented on PR #46223: URL: https://github.com/apache/spark/pull/46223#issuecomment-2078918918 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47914][SQL] Do not display the splits parameter in Range [spark]

2024-04-26 Thread via GitHub
guixiaowen commented on PR #46136: URL: https://github.com/apache/spark/pull/46136#issuecomment-2079061228 > > Mind checking the test failures? > > ok. I will check it. @HyukjinKwon h hi Do you have any questions about this place? -- This is an automated message from the

[PR] [SPARK-48008][WIP] Support UDAFs in Spark Connect [spark]

2024-04-26 Thread via GitHub
xupefei opened a new pull request, #46245: URL: https://github.com/apache/spark/pull/46245 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [SPARK-47968][SQL] MsSQLServer: Map datatimeoffset to TimestampType [spark]

2024-04-26 Thread via GitHub
yaooqinn opened a new pull request, #46239: URL: https://github.com/apache/spark/pull/46239 ### What changes were proposed in this pull request? This PR changes the `datatimeoffset -> StringType` mapping to `datatimeoffset -> TimestampType` mapping as we use `mssql-jdbc`

[PR] [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with boolean comparison in MsSqlServer [spark]

2024-04-26 Thread via GitHub
yaooqinn opened a new pull request, #46236: URL: https://github.com/apache/spark/pull/46236 ### What changes were proposed in this pull request? In https://github.com/apache/spark/pull/45564, predicate pushdown with boolean comparison syntax in MsSqlServer is disabled as

[PR] [SPARK-48001][CORE] Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` [spark]

2024-04-26 Thread via GitHub
LuciferYang opened a new pull request, #46238: URL: https://github.com/apache/spark/pull/46238 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server [spark]

2024-04-26 Thread via GitHub
zhengruifeng closed pull request #46221: [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server URL: https://github.com/apache/spark/pull/46221 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server [spark]

2024-04-26 Thread via GitHub
zhengruifeng commented on PR #46221: URL: https://github.com/apache/spark/pull/46221#issuecomment-2078787886 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47409][SQL] Add support for collation for StringTrim type of functions/expressions [spark]

2024-04-26 Thread via GitHub
mihailom-db commented on code in PR #46206: URL: https://github.com/apache/spark/pull/46206#discussion_r1580518571 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -306,6 +308,258 @@ public static int execICU(final UTF8String

Re: [PR] [SPARK-47409][SQL] Add support for collation for StringTrim type of functions/expressions [spark]

2024-04-26 Thread via GitHub
mihailom-db commented on code in PR #46206: URL: https://github.com/apache/spark/pull/46206#discussion_r1580524717 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -528,6 +528,235 @@ public void testFindInSet() throws SparkException

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1580561802 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +494,103 @@ def stop(self) -> None: ... +class SimpleDataSourceStreamReader(ABC): +""" +A

Re: [PR] [SPARK-47999] Improve logging around snapshot creation and adding/removing entries from state cache map [spark]

2024-04-26 Thread via GitHub
HeartSaVioR commented on PR #46233: URL: https://github.com/apache/spark/pull/46233#issuecomment-2078732698 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47955][SQL] Improve `DeduplicateRelations` performance [spark]

2024-04-26 Thread via GitHub
beliefer commented on code in PR #46183: URL: https://github.com/apache/spark/pull/46183#discussion_r1580583432 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala: ## @@ -38,28 +38,29 @@ case class RelationWrapper(cls: Class[_],

Re: [PR] [SPARK-47922][SQL] Implement the try_parse_json expression [spark]

2024-04-26 Thread via GitHub
cloud-fan closed pull request #46141: [SPARK-47922][SQL] Implement the try_parse_json expression URL: https://github.com/apache/spark/pull/46141 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47922][SQL] Implement the try_parse_json expression [spark]

2024-04-26 Thread via GitHub
cloud-fan commented on PR #46141: URL: https://github.com/apache/spark/pull/46141#issuecomment-2078722106 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47999][SS] Improve logging around snapshot creation and adding/removing entries from state cache map in HDFS backed state store provider [spark]

2024-04-26 Thread via GitHub
HeartSaVioR closed pull request #46233: [SPARK-47999][SS] Improve logging around snapshot creation and adding/removing entries from state cache map in HDFS backed state store provider URL: https://github.com/apache/spark/pull/46233 -- This is an automated message from the Apache Git

Re: [PR] [SPARK-48001][CORE] Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` [spark]

2024-04-26 Thread via GitHub
yaooqinn closed pull request #46238: [SPARK-48001][CORE] Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` URL: https://github.com/apache/spark/pull/46238 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47999] Improve logging around snapshot creation and adding/removing entries from state cache map [spark]

2024-04-26 Thread via GitHub
anishshri-db commented on code in PR #46233: URL: https://github.com/apache/spark/pull/46233#discussion_r1580520020 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -437,6 +437,23 @@ private[sql] class

Re: [PR] [SPARK-48001][CORE] Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` [spark]

2024-04-26 Thread via GitHub
LuciferYang commented on PR #46238: URL: https://github.com/apache/spark/pull/46238#issuecomment-2078804299 Thanks @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48001][CORE] Remove unused `private implicit def arrayToArrayWritable` from `SparkContext` [spark]

2024-04-26 Thread via GitHub
yaooqinn commented on PR #46238: URL: https://github.com/apache/spark/pull/46238#issuecomment-2078803704 Merged to master. Thank you @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47999] Improve logging around snapshot creation and adding/removing entries from state cache map [spark]

2024-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #46233: URL: https://github.com/apache/spark/pull/46233#discussion_r1580548906 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -437,6 +437,23 @@ private[sql] class

[PR] [SPARK-48002][PYTHON][SS] Add test for observed metrics in PySpark StreamingQueryListener [spark]

2024-04-26 Thread via GitHub
WweiL opened a new pull request, #46237: URL: https://github.com/apache/spark/pull/46237 ### What changes were proposed in this pull request? Following this doc test revisit PR https://github.com/apache/spark/pull/46189, for extra safety, add a unit test that verify observed

Re: [PR] [SPARK-47955][SQL] Improve `DeduplicateRelations` performance [spark]

2024-04-26 Thread via GitHub
beliefer commented on PR #46183: URL: https://github.com/apache/spark/pull/46183#issuecomment-2078791435 Late LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47409][SQL] Add support for collation for StringTrim type of functions/expressions [spark]

2024-04-26 Thread via GitHub
mihailom-db commented on code in PR #46206: URL: https://github.com/apache/spark/pull/46206#discussion_r1580521078 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -608,6 +610,181 @@ class CollationStringExpressionsSuite

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1581254368 ## spark-operator-api/src/main/java/org/apache/spark/k8s/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1581252300 ## spark-operator-api/src/main/java/org/apache/spark/k8s/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1581304652 ## python/pyspark/sql/datasource_internal.py: ## @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] [SPARK-47050][SQL] Collect and publish partition level metrics for V1 [spark]

2024-04-26 Thread via GitHub
dbtsai commented on PR #46188: URL: https://github.com/apache/spark/pull/46188#issuecomment-2079779162 Gently pinging @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1581319890 ## python/pyspark/sql/datasource_internal.py: ## @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-26 Thread via GitHub
gengliangwang closed pull request #46193: [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms URL: https://github.com/apache/spark/pull/46193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-26 Thread via GitHub
jiangzho commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1581331728 ## spark-operator-api/src/test/java/org/apache/spark/k8s/operator/spec/RestartPolicyTest.java: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-26 Thread via GitHub
jiangzho commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1581331444 ## spark-operator-api/src/main/java/org/apache/spark/k8s/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the

Re: [PR] [SPARK-46350][SS] Fix state removal for stream-stream join with one watermark and one time-interval condition [spark]

2024-04-26 Thread via GitHub
rangadi commented on code in PR #44323: URL: https://github.com/apache/spark/pull/44323#discussion_r1581366231 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala: ## @@ -219,10 +222,35 @@ object

Re: [PR] [WIP] Ensure that ForeachBatch can use libraries imported externally [spark]

2024-04-26 Thread via GitHub
ericm-db closed pull request #46191: [WIP] Ensure that ForeachBatch can use libraries imported externally URL: https://github.com/apache/spark/pull/46191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression [spark]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on PR #46248: URL: https://github.com/apache/spark/pull/46248#issuecomment-2079911328 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server [spark]

2024-04-26 Thread via GitHub
juliuszsompolski commented on PR #46221: URL: https://github.com/apache/spark/pull/46221#issuecomment-2079934132 > yes we most likely need the same for Scala. @nemanja-boric-databricks @nija-at would one of you have time to followup and check and fix for scala if needed? -- This

Re: [PR] [SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests [spark]

2024-04-26 Thread via GitHub
hvanhovell closed pull request #46098: [SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests URL: https://github.com/apache/spark/pull/46098 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47818][CONNECT][FOLLOW-UP] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests [spark]

2024-04-26 Thread via GitHub
hvanhovell commented on PR #46098: URL: https://github.com/apache/spark/pull/46098#issuecomment-2079778912 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1581258317 ## spark-operator-api/src/test/java/org/apache/spark/k8s/operator/spec/RestartPolicyTest.java: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1581314644 ## python/pyspark/sql/datasource_internal.py: ## @@ -0,0 +1,146 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] [SPARK-47986][CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server [spark]

2024-04-26 Thread via GitHub
nemanja-boric-databricks commented on PR #46221: URL: https://github.com/apache/spark/pull/46221#issuecomment-2079973373 Yeah, I'll followup with the fix PR for scala client -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] [Draft] - Testing of Streaming and Collations [spark]

2024-04-26 Thread via GitHub
dbatomic opened a new pull request, #46247: URL: https://github.com/apache/spark/pull/46247 ### What changes were proposed in this pull request? Draft PR for Collation tests in Streaming. ### Why are the changes needed? ### Does this PR introduce

Re: [PR] [SPARK-48007][BUILD] MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11 [spark]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on code in PR #46244: URL: https://github.com/apache/spark/pull/46244#discussion_r1581168210 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala: ## @@ -28,5 +28,6 @@ class

Re: [PR] [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with boolean comparison in MsSqlServer [spark]

2024-04-26 Thread via GitHub
yaooqinn closed pull request #46236: [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with boolean comparison in MsSqlServer URL: https://github.com/apache/spark/pull/46236 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1581246394 ## build.gradle: ## @@ -72,6 +72,8 @@ subprojects { '', 'org.apache.spark', ) + toggleOffOn() + targetExclude

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1581256983 ## spark-operator-api/src/main/java/org/apache/spark/k8s/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the

Re: [PR] [SPARK-48005][PS][CONNECT][TESTS] Enable `DefaultIndexParityTests.test_index_distributed_sequence_cleanup` [spark]

2024-04-26 Thread via GitHub
dongjoon-hyun closed pull request #46242: [SPARK-48005][PS][CONNECT][TESTS] Enable `DefaultIndexParityTests.test_index_distributed_sequence_cleanup` URL: https://github.com/apache/spark/pull/46242 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-46122][SQL] Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default [spark]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on PR #46207: URL: https://github.com/apache/spark/pull/46207#issuecomment-2079759481 I started a vote for this PR too. - https://lists.apache.org/thread/x09gynt90v3hh5sql1gt9dlcn6m6699p -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47050][SQL] Collect and publish partition level metrics for V1 [spark]

2024-04-26 Thread via GitHub
dbtsai commented on code in PR #46188: URL: https://github.com/apache/spark/pull/46188#discussion_r1581313246 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala: ## @@ -213,6 +260,14 @@ class BasicWriteJobStatsTracker(

Re: [PR] [SPARK-47963][CORE] Make the external Spark ecosystem can use structured logging mechanisms [spark]

2024-04-26 Thread via GitHub
gengliangwang commented on PR #46193: URL: https://github.com/apache/spark/pull/46193#issuecomment-2079790129 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-48011][Core] Store LogKey name as a value to avoid generating new string instances [spark]

2024-04-26 Thread via GitHub
gengliangwang opened a new pull request, #46249: URL: https://github.com/apache/spark/pull/46249 ### What changes were proposed in this pull request? Store LogKey name as a value to avoid generating new string instances ### Why are the changes needed? To save

Re: [PR] [SPARK-48011][Core] Store LogKey name as a value to avoid generating new string instances [spark]

2024-04-26 Thread via GitHub
gengliangwang commented on code in PR #46249: URL: https://github.com/apache/spark/pull/46249#discussion_r1581359491 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -16,10 +16,14 @@ */ package org.apache.spark.internal +import java.util.Locale

Re: [PR] [SPARK-48011][Core] Store LogKey name as a value to avoid generating new string instances [spark]

2024-04-26 Thread via GitHub
gengliangwang commented on PR #46249: URL: https://github.com/apache/spark/pull/46249#issuecomment-2079860587 cc @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1581250016 ## spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/InstanceConfig.java: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1581250381 ## spark-operator-api/src/main/java/org/apache/spark/k8s/operator/spec/InstanceConfig.java: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache

[PR] [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression [spark]

2024-04-26 Thread via GitHub
nikhilsheoran-db opened a new pull request, #46248: URL: https://github.com/apache/spark/pull/46248 ### What changes were proposed in this pull request? - This PR instead of calling `conf.resolver` for each call in `resolveExpression`, reuses the `resolver` obtained once. ### Why

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-26 Thread via GitHub
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1581301878 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression [spark]

2024-04-26 Thread via GitHub
dongjoon-hyun closed pull request #46248: [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression URL: https://github.com/apache/spark/pull/46248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-26 Thread via GitHub
cloud-fan closed pull request #46165: [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions URL: https://github.com/apache/spark/pull/46165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47440][SQL][FOLLOWUP] Reenable predicate pushdown for syntax with boolean comparison in MsSqlServer [spark]

2024-04-26 Thread via GitHub
dongjoon-hyun commented on PR #46236: URL: https://github.com/apache/spark/pull/46236#issuecomment-2079574626 cc @stefanbuk-db , too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47476][SQL] Support REPLACE function to work with collated strings [spark]

2024-04-26 Thread via GitHub
cloud-fan commented on PR #45704: URL: https://github.com/apache/spark/pull/45704#issuecomment-2079601554 the Spark Connect test failure is flaky and unrelated here, I'm merging it to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47476][SQL] Support REPLACE function to work with collated strings [spark]

2024-04-26 Thread via GitHub
cloud-fan closed pull request #45704: [SPARK-47476][SQL] Support REPLACE function to work with collated strings URL: https://github.com/apache/spark/pull/45704 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47968][SQL] MsSQLServer: Map datatimeoffset to TimestampType [spark]

2024-04-26 Thread via GitHub
yaooqinn commented on PR #46239: URL: https://github.com/apache/spark/pull/46239#issuecomment-2079629526 Irrelevant test failure in pyspark connect ``` ERROR StatusConsoleListener An exception occurred processing Appender File java.lang.IllegalArgumentException: found 1

Re: [PR] [SPARK-47350][SQL] Add collation support for SplitPart string expression [spark]

2024-04-26 Thread via GitHub
cloud-fan closed pull request #46158: [SPARK-47350][SQL] Add collation support for SplitPart string expression URL: https://github.com/apache/spark/pull/46158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47968][SQL] MsSQLServer: Map datatimeoffset to TimestampType [spark]

2024-04-26 Thread via GitHub
yaooqinn closed pull request #46239: [SPARK-47968][SQL] MsSQLServer: Map datatimeoffset to TimestampType URL: https://github.com/apache/spark/pull/46239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-26 Thread via GitHub
cloud-fan commented on PR #46165: URL: https://github.com/apache/spark/pull/46165#issuecomment-2079297307 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [Only Test] [spark]

2024-04-26 Thread via GitHub
panbingkun opened a new pull request, #46246: URL: https://github.com/apache/spark/pull/46246 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

  1   2   >