Re: [PR] [MINOR][DOCS][PYTHON] Fix documentation typo in takeSample method [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45419: URL: https://github.com/apache/spark/pull/45419#issuecomment-1983320337 Merged to master Thank you @kimborowicz @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
dbatomic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1515983493 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala: ## @@ -218,6 +218,6 @@ class DataTypeAstBuilder extends

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
dbatomic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1515984970 ## python/pyspark/sql/tests/test_types.py: ## @@ -862,15 +862,13 @@ def test_parse_datatype_string(self): if k != "varchar" and k != "char":

Re: [PR] [SPARK-47278][BUILD] Upgrade rocksdbjni to 8.11.3 [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45365: URL: https://github.com/apache/spark/pull/45365#issuecomment-1983298155 Thanks @yaooqinn @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
ted-jenks commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1982836579 @dongjoon-hyun > It sounds like you have other systems to read Spark's data. Correct. The issue was that from 3.2 to 3.3 there was a behavior change in the base64 encodings used

Re: [PR] [SPARK-47314][DOC] Correct the `ExternalSorter#writePartitionedMapOutput` method comment [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45415: URL: https://github.com/apache/spark/pull/45415#discussion_r1515746618 ## core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala: ## @@ -690,7 +690,7 @@ private[spark] class ExternalSorter[K, V, C]( * Write all the

Re: [PR] [SPARK-47301][SQL][TESTS] Fix flaky ParquetIOSuite [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45403: [SPARK-47301][SQL][TESTS] Fix flaky ParquetIOSuite URL: https://github.com/apache/spark/pull/45403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47301][SQL][TESTS] Fix flaky ParquetIOSuite [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45403: URL: https://github.com/apache/spark/pull/45403#issuecomment-1983075720 Merged to master. Thank you, @panbingkun & @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator [spark]

2024-03-07 Thread via GitHub
cloud-fan closed pull request #45350: [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator URL: https://github.com/apache/spark/pull/45350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on PR #45350: URL: https://github.com/apache/spark/pull/45350#issuecomment-1983026959 thanks for the review, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-43124][SQL] Add ConvertCommandResultToLocalRelation rule [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45397: URL: https://github.com/apache/spark/pull/45397#discussion_r1515785048 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ConvertCommandResultToLocalRelation.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-47300][SQL] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45401: URL: https://github.com/apache/spark/pull/45401#discussion_r1515795172 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala: ## @@ -129,16 +129,6 @@ class StringUtilsSuite extends SparkFunSuite with

Re: [PR] [SPARK-36691][PYTHON] PythonRunner failed should pass error message to ApplicationMaster too [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #33934: URL: https://github.com/apache/spark/pull/33934#discussion_r1515813583 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -3281,6 +3282,80 @@ private[spark] class RedirectThread( } } +private[spark] class

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45418: URL: https://github.com/apache/spark/pull/45418#issuecomment-1983087260 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-43124][SQL] Add ConvertCommandResultToLocalRelation rule [spark]

2024-03-07 Thread via GitHub
wForget closed pull request #45397: [SPARK-43124][SQL] Add ConvertCommandResultToLocalRelation rule URL: https://github.com/apache/spark/pull/45397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-43124][SQL] Add ConvertCommandResultToLocalRelation rule [spark]

2024-03-07 Thread via GitHub
wForget commented on PR #45397: URL: https://github.com/apache/spark/pull/45397#issuecomment-1983129621 Close with comment: https://github.com/apache/spark/pull/45397#discussion_r1515557219 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516357243 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

Re: [PR] [SPARK-47248][SQL][COLLATION] Improved string function support: contains [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on PR #45382: URL: https://github.com/apache/spark/pull/45382#issuecomment-1983758947 The GA jobs all passed: https://github.com/uros-db/spark/actions/runs/8186876833/job/22395549669 merging to master, thanks! -- This is an automated message from the Apache Git

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516373801 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

Re: [PR] [SQL] Bind JDBC dialect to JDBCRDD at construction [spark]

2024-03-07 Thread via GitHub
johnnywalker commented on code in PR #45410: URL: https://github.com/apache/spark/pull/45410#discussion_r1516375276 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala: ## @@ -153,12 +153,12 @@ object JDBCRDD extends Logging { */ class

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
tgravescs commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1516375405 ## python/pyspark/resource/tests/test_connect_resources.py: ## @@ -0,0 +1,46 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516378011 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516389365 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -384,27 +387,47 @@ public boolean startsWith(final UTF8String prefix) { }

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516391257 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -31,6 +32,8 @@ import com.esotericsoftware.kryo.io.Input; import

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516396458 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

[PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
uros-db opened a new pull request, #45422: URL: https://github.com/apache/spark/pull/45422 ### What changes were proposed in this pull request? ### Why are the changes needed? Currently, all `StringType` arguments passed to built-in string functions in Spark SQL get

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516379232 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -31,6 +32,8 @@ import com.esotericsoftware.kryo.io.Input; import

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516380909 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -384,27 +387,47 @@ public boolean startsWith(final UTF8String prefix) { }

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1516398411 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516378011 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

Re: [PR] [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider [spark]

2024-03-07 Thread via GitHub
cloud-fan closed pull request #45409: [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider URL: https://github.com/apache/spark/pull/45409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on PR #45409: URL: https://github.com/apache/spark/pull/45409#issuecomment-1983973892 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47248][SQL][COLLATION] Improved string function support: contains [spark]

2024-03-07 Thread via GitHub
cloud-fan closed pull request #45382: [SPARK-47248][SQL][COLLATION] Improved string function support: contains URL: https://github.com/apache/spark/pull/45382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516381847 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -384,27 +387,47 @@ public boolean startsWith(final UTF8String prefix) { }

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516396458 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516415830 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-07 Thread via GitHub
jchen5 commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1516503722 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -34,7 +34,7 @@ import

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
dbatomic commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1516510742 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation

[PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
miland-db opened a new pull request, #45423: URL: https://github.com/apache/spark/pull/45423 ### What changes were proposed in this pull request? In the PR, I propose to assign the proper names to the legacy error classes _LEGACY_ERROR_TEMP_324[7-9], and modify tests in testing suites to

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
sahnib commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1516471532 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
miland-db closed pull request #45423: Miland db/miland legacy error class URL: https://github.com/apache/spark/pull/45423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516535896 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -117,7 +117,7 @@ object DataType { private val FIXED_DECIMAL =

[PR] [SPARK-47319][SQL] Fix missingInput calculation [spark]

2024-03-07 Thread via GitHub
peter-toth opened a new pull request, #45424: URL: https://github.com/apache/spark/pull/45424 ### What changes were proposed in this pull request? This PR speeds up `QueryPlan.missingInput()` calculation. ### Why are the changes needed? This seems to be the root cause of

Re: [PR] [SPARK-37932][SQL]Wait to resolve missing attributes before applying DeduplicateRelations [spark]

2024-03-07 Thread via GitHub
peter-toth commented on PR #35684: URL: https://github.com/apache/spark/pull/35684#issuecomment-1984107426 @martinf-moodys, [SPARK-47319](https://issues.apache.org/jira/browse/SPARK-47319) / https://github.com/apache/spark/pull/45424 might help, especially if you have many `Union` nodes

<    1   2