Re: [PR] [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45587: URL: https://github.com/apache/spark/pull/45587#issuecomment-2008700496 +1 for (1) adding a new comment to clarify that this `synchronized` function needs to be retained for testing. I'm not sure for (2) because it's a separate issue in this

Re: [PR] [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` [spark]

2024-03-19 Thread via GitHub
LuciferYang commented on PR #45587: URL: https://github.com/apache/spark/pull/45587#issuecomment-2008697337 1. I am fine to revert this commit, but while revert, I think some comments should be added to the `totalRunningTasksPerResourceProfile ` function to clarify that this function needs

[PR] [Don't Review] Branch 3.5 list python packages [spark]

2024-03-19 Thread via GitHub
panbingkun opened a new pull request, #45601: URL: https://github.com/apache/spark/pull/45601 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45587: URL: https://github.com/apache/spark/pull/45587#issuecomment-2008694731 Thank you for the context, @mridulm and @tgravescs . WDYT, @LuciferYang . Shall we revert this simply for now and revisit later (if needed), @LuciferYang ? -- This is an

[PR] [Don't Review] Only for list python packages for branch-3.4 [spark]

2024-03-19 Thread via GitHub
panbingkun opened a new pull request, #45600: URL: https://github.com/apache/spark/pull/45600 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47473][SQL] Fix correctness issue of converting postgres INFINITIES timestamps [spark]

2024-03-19 Thread via GitHub
yaooqinn commented on PR #45599: URL: https://github.com/apache/spark/pull/45599#issuecomment-2008688099 cc @cloud-fan @dongjoon-hyun @michaelzhan-db @yruslan, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SPARK-47473][SQL] Fix correctness issue of converting postgres INFINITIES timestamps [spark]

2024-03-19 Thread via GitHub
yaooqinn opened a new pull request, #45599: URL: https://github.com/apache/spark/pull/45599 ### What changes were proposed in this pull request? This PR fixes a bug that Epoch Second is used instead of epoch millis for create a timestamp value ### Why are the

Re: [PR] [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` [spark]

2024-03-19 Thread via GitHub
mridulm commented on PR #45587: URL: https://github.com/apache/spark/pull/45587#issuecomment-2008672641 To give context, I am prototyping dynamically adapting a stage/tasks resource profile based on runtime execution behavior, and so happened to have looked at this code recently and had

Re: [PR] [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` [spark]

2024-03-19 Thread via GitHub
mridulm commented on PR #45587: URL: https://github.com/apache/spark/pull/45587#issuecomment-2008669180 I was looking at it from point of view of, is the change safe - is the behavior the same before and after ? From that point of view, at face value of the PR, it appears that it need

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-200816 why do we remove a working feature? what's wrong with `to_csv` generating non-standard but pretty strings for these values? -- This is an automated message from the Apache Git

Re: [PR] [SPARK-47007][SQL][PYTHON][R][CONNECT] Add the `map_sort` function [spark]

2024-03-19 Thread via GitHub
MaxGekk closed pull request #45069: [SPARK-47007][SQL][PYTHON][R][CONNECT] Add the `map_sort` function URL: https://github.com/apache/spark/pull/45069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47007][SQL][PYTHON][R][CONNECT] Add the `map_sort` function [spark]

2024-03-19 Thread via GitHub
MaxGekk commented on PR #45069: URL: https://github.com/apache/spark/pull/45069#issuecomment-2008662444 +1, LGTM. Merging to master. Thank you, @stevomitric @stefankandic and @HyukjinKwon @zhengruifeng for review. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-47273][SS][PYTHON] implement Python data stream writer interface. [spark]

2024-03-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #45305: URL: https://github.com/apache/spark/pull/45305#discussion_r1531506644 ## python/pyspark/sql/datasource.py: ## @@ -513,6 +536,71 @@ def abort(self, messages: List["WriterCommitMessage"]) -> None: ... +class

Re: [PR] [SPARK-47273][SS][PYTHON] implement Python data stream writer interface. [spark]

2024-03-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #45305: URL: https://github.com/apache/spark/pull/45305#discussion_r1531506553 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/python/PythonStreamingSinkCommitRunner.scala: ## @@ -0,0 +1,126 @@ +/* + * Licensed to

Re: [PR] [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect [spark]

2024-03-19 Thread via GitHub
yaooqinn commented on PR #45588: URL: https://github.com/apache/spark/pull/45588#issuecomment-2008641864 It's incorrect, it's like we read a smallint and write an int back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47471][SQL] Support order-insensitive lateral column alias [spark]

2024-03-19 Thread via GitHub
wangyum commented on PR #45598: URL: https://github.com/apache/spark/pull/45598#issuecomment-2008635661 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-47471][SQL] Support order-insensitive lateral column alias [spark]

2024-03-19 Thread via GitHub
wangyum opened a new pull request, #45598: URL: https://github.com/apache/spark/pull/45598 ### What changes were proposed in this pull request? This PR makes it support order-insensitive lateral column alias. For example: ```sql SELECT base_salary + bonus AS total_salary, salary

Re: [PR] [ONLY TEST] Branch 3.5 CI [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45550: URL: https://github.com/apache/spark/pull/45550#issuecomment-2008627161 Hi, @panbingkun . Shall we rebase this to the master and merge to recover `branch-3.5`? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun closed pull request #45595: [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` URL: https://github.com/apache/spark/pull/45595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45595: URL: https://github.com/apache/spark/pull/45595#issuecomment-2008623766 Thank you, @HyukjinKwon . Merged to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [WIP][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45595: URL: https://github.com/apache/spark/pull/45595#issuecomment-2008622814 Pandas test are recovered. ![Screenshot 2024-03-19 at 20 51 12](https://github.com/apache/spark/assets/9700541/d49f9d38-f88e-4f6e-bb64-ab924fa68fef) -- This is an

Re: [PR] [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0 [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45583: URL: https://github.com/apache/spark/pull/45583#issuecomment-2008621452 Thank you, @LuciferYang . For this PR, I fixed the following three so far, but I guess there is one more to go. Let's see the CI result. - #45585 - #45594 - #45597

Re: [PR] [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45594: URL: https://github.com/apache/spark/pull/45594#issuecomment-2008619209 Thank you, @huaxingao . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45588: URL: https://github.com/apache/spark/pull/45588#issuecomment-2008617318 Is this correct? > Before, we read a smallint(db) as int(spark) in getCatalytType, and then we write an IntegerType(spark) to smallint(db) in getJDBCType According to

Re: [PR] [SPARK-47470][SQL][TESTS] Ignore `IntentionallyFaultyConnectionProvider` error in `CliSuite` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun closed pull request #45597: [SPARK-47470][SQL][TESTS] Ignore `IntentionallyFaultyConnectionProvider` error in `CliSuite` URL: https://github.com/apache/spark/pull/45597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47470][SQL][TESTS] Ignore `IntentionallyFaultyConnectionProvider` error in `CliSuite` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45597: URL: https://github.com/apache/spark/pull/45597#issuecomment-2008606255 I tested locally and pasted the result to the PR description. Let me merge this. Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache

Re: [PR] [SPARK-47470][SQL][TESTS] Ignore `IntentionallyFaultyConnectionProvider` error in `CliSuite` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45597: URL: https://github.com/apache/spark/pull/45597#issuecomment-2008605230 Thank you, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect [spark]

2024-03-19 Thread via GitHub
yaooqinn commented on PR #45588: URL: https://github.com/apache/spark/pull/45588#issuecomment-2008601011 Thank you @dongjoon-hyun For the case that users read/write things in a roundtrip: - Before, we read a smallint(db) as int(spark) in getCatalytType, and then we write an

[PR] [SPARK-47470][SQL][TESTS] Make `CliSuite` ignores `IntentionallyFaultyConnectionProvider` error [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #45597: URL: https://github.com/apache/spark/pull/45597 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45588: URL: https://github.com/apache/spark/pull/45588#issuecomment-2008586669 No, it was a slightly different issue. IIRC, a user read and tries to write back (with overwrite) and it broke their existing Database schema. And, their whole backend systems was

Re: [PR] [SPARK-47447][SQL] Allow reading Parquet TimestampLTZ as TimestampNTZ [spark]

2024-03-19 Thread via GitHub
beliefer commented on code in PR #45571: URL: https://github.com/apache/spark/pull/45571#discussion_r1531453686 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala: ## @@ -436,6 +436,17 @@ private[parquet] class

Re: [PR] [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect [spark]

2024-03-19 Thread via GitHub
yaooqinn commented on PR #45588: URL: https://github.com/apache/spark/pull/45588#issuecomment-2008579737 Hi @yaooqinn The regression you mentioned was performed in SPARK-43049 and undone in SPARK-46478. SPARK-43049 modified the `string->varchar(255)` to `string->clob` in the Oracle

Re: [PR] [SPARK-47330][SQL][TESTS] XML: Added XmlExpressionsSuite [spark]

2024-03-19 Thread via GitHub
HyukjinKwon closed pull request #45445: [SPARK-47330][SQL][TESTS] XML: Added XmlExpressionsSuite URL: https://github.com/apache/spark/pull/45445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47330][SQL][TESTS] XML: Added XmlExpressionsSuite [spark]

2024-03-19 Thread via GitHub
HyukjinKwon commented on PR #45445: URL: https://github.com/apache/spark/pull/45445#issuecomment-2008569121 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45503: URL: https://github.com/apache/spark/pull/45503#discussion_r1531442175 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala: ## @@ -158,14 +160,292 @@ class RocksDBStateStoreSuite

Re: [PR] [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven [spark]

2024-03-19 Thread via GitHub
huaxingao commented on PR #45594: URL: https://github.com/apache/spark/pull/45594#issuecomment-2008550099 Late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45503: URL: https://github.com/apache/spark/pull/45503#discussion_r1531435034 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -192,6 +204,234 @@ class PrefixKeyScanStateEncoder(

[PR] [SPARK-47469] Add availNow tests for TWS operator [spark]

2024-03-19 Thread via GitHub
jingz-db opened a new pull request, #45596: URL: https://github.com/apache/spark/pull/45596 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47309][SQL] XML: Add schema inference tests for value tags [spark]

2024-03-19 Thread via GitHub
HyukjinKwon closed pull request #45538: [SPARK-47309][SQL] XML: Add schema inference tests for value tags URL: https://github.com/apache/spark/pull/45538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47309][SQL] XML: Add schema inference tests for value tags [spark]

2024-03-19 Thread via GitHub
HyukjinKwon commented on PR #45538: URL: https://github.com/apache/spark/pull/45538#issuecomment-2008486755 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47309][SQL] XML: Add schema inference tests for value tags [spark]

2024-03-19 Thread via GitHub
HyukjinKwon commented on PR #45538: URL: https://github.com/apache/spark/pull/45538#issuecomment-2008486564 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45503: URL: https://github.com/apache/spark/pull/45503#discussion_r1531410814 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -110,8 +119,11 @@ class PrefixKeyScanStateEncoder(

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45503: URL: https://github.com/apache/spark/pull/45503#discussion_r1531410661 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -305,9 +307,15 @@ private[sql] class

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45503: URL: https://github.com/apache/spark/pull/45503#discussion_r1531410457 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -192,6 +204,234 @@ class PrefixKeyScanStateEncoder(

[PR] [WIP][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #45595: URL: https://github.com/apache/spark/pull/45595 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47366][SQL] Implement parse_json. [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45479: URL: https://github.com/apache/spark/pull/45479#issuecomment-2008477595 Thank you, @cloud-fan . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun closed pull request #45594: [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven URL: https://github.com/apache/spark/pull/45594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45594: URL: https://github.com/apache/spark/pull/45594#issuecomment-2008469516 Thank you, @HyukjinKwon . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47366][SQL] Implement parse_json. [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on PR #45479: URL: https://github.com/apache/spark/pull/45479#issuecomment-2008427418 SGTM, @chenhao-db can you open a followup PR to address it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45594: URL: https://github.com/apache/spark/pull/45594#issuecomment-2008423943 All tests passed. ![Screenshot 2024-03-19 at 17 41 00](https://github.com/apache/spark/assets/9700541/f9bc97d7-af01-445e-9563-a69c9ab52e47) -- This is an automated

Re: [PR] [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45594: URL: https://github.com/apache/spark/pull/45594#issuecomment-2008416866 Could you review this PR, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47454][PYTHON][CONNECT][TESTS][FOLLOWUP] Further split `pyspark.sql.tests.test_dataframe` [spark]

2024-03-19 Thread via GitHub
HyukjinKwon closed pull request #45591: [SPARK-47454][PYTHON][CONNECT][TESTS][FOLLOWUP] Further split `pyspark.sql.tests.test_dataframe` URL: https://github.com/apache/spark/pull/45591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47454][PYTHON][CONNECT][TESTS][FOLLOWUP] Further split `pyspark.sql.tests.test_dataframe` [spark]

2024-03-19 Thread via GitHub
HyukjinKwon commented on PR #45591: URL: https://github.com/apache/spark/pull/45591#issuecomment-2008398860 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47449][SS] Refactor and split list/timer unit tests [spark]

2024-03-19 Thread via GitHub
HeartSaVioR closed pull request #45573: [SPARK-47449][SS] Refactor and split list/timer unit tests URL: https://github.com/apache/spark/pull/45573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47449][SS] Refactor and split list/timer unit tests [spark]

2024-03-19 Thread via GitHub
HeartSaVioR commented on PR #45573: URL: https://github.com/apache/spark/pull/45573#issuecomment-2008390186 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-46349] Prevent nested SortOrder instances in SortOrder expressions [spark]

2024-03-19 Thread via GitHub
github-actions[bot] commented on PR #44283: URL: https://github.com/apache/spark/pull/44283#issuecomment-2008388488 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47451][SQL] Support to_json(variant). [spark]

2024-03-19 Thread via GitHub
gene-db commented on PR #45575: URL: https://github.com/apache/spark/pull/45575#issuecomment-2008378710 @HyukjinKwon @cloud-fan Could you take a look at this? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on code in PR #45590: URL: https://github.com/apache/spark/pull/45590#discussion_r1531250094 ## .github/labeler.yml: ## @@ -101,6 +101,8 @@ SQL: ] - any-glob-to-any-file: [ 'common/unsafe/**/*', + 'common/sketch/**/*', Review

Re: [PR] [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun closed pull request #45590: [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant` URL: https://github.com/apache/spark/pull/45590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47273][SS][PYTHON] implement Python data stream writer interface. [spark]

2024-03-19 Thread via GitHub
chaoqin-li1123 commented on code in PR #45305: URL: https://github.com/apache/spark/pull/45305#discussion_r1531248469 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonStreamingDataSourceSuite.scala: ## @@ -230,4 +273,220 @@ class

Re: [PR] [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant` [spark]

2024-03-19 Thread via GitHub
panbingkun commented on code in PR #45590: URL: https://github.com/apache/spark/pull/45590#discussion_r1531242294 ## .github/labeler.yml: ## @@ -101,6 +101,8 @@ SQL: ] - any-glob-to-any-file: [ 'common/unsafe/**/*', + 'common/sketch/**/*', Review Comment:

Re: [PR] [SPARK-47273][SS][PYTHON] implement Python data stream writer interface. [spark]

2024-03-19 Thread via GitHub
allisonwang-db commented on code in PR #45305: URL: https://github.com/apache/spark/pull/45305#discussion_r1531233108 ## python/pyspark/sql/datasource.py: ## @@ -513,6 +536,71 @@ def abort(self, messages: List["WriterCommitMessage"]) -> None: ... +class

Re: [PR] [SPARK-47468][BUILD] Exclude `logback` from SBT dependency like Maven [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45594: URL: https://github.com/apache/spark/pull/45594#issuecomment-2008287187 Could you review this dependency PR, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-47468][BUILD] Exclude `logback` from SBT dependency like Maven [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun opened a new pull request, #45594: URL: https://github.com/apache/spark/pull/45594 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-46990][SQL] Fix loading empty Avro files emitted by event-hubs [spark]

2024-03-19 Thread via GitHub
sadikovi commented on code in PR #45578: URL: https://github.com/apache/spark/pull/45578#discussion_r1531179747 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala: ## @@ -197,19 +197,21 @@ private[sql] object AvroUtils extends Logging { def

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
mihailom-db commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-2008104827 > LGTM > > As a follow up we should revisit error messages. IMO it is weird to expose message with "string_any_collation" type to customer. But I think that we can do that as

Re: [PR] [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45587: URL: https://github.com/apache/spark/pull/45587#issuecomment-2008100572 Do you mean that we need to revert this commit because `ExecutorAllocationManagerSuite` needs that for stable test coverage? -- This is an automated message from the Apache Git

Re: [PR] [SQL] Bind JDBC dialect to JDBCRDD at construction [spark]

2024-03-19 Thread via GitHub
johnnywalker commented on code in PR #45410: URL: https://github.com/apache/spark/pull/45410#discussion_r1531090201 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala: ## @@ -153,12 +153,12 @@ object JDBCRDD extends Logging { */ class

Re: [PR] [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45587: URL: https://github.com/apache/spark/pull/45587#issuecomment-2008097171 To @mridulm and @tgravescs , do you mean you are depending on the `private` method of `ExecutorAllocationManager` somehow? Just for my understanding, could you elaborate how you

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
jingz-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1531068752 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -127,13 +152,53 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-19 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1530840650 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47404][SQL] Add hooks to release the ANTLR DFA cache after parsing SQL [spark]

2024-03-19 Thread via GitHub
markj-db commented on PR #45526: URL: https://github.com/apache/spark/pull/45526#issuecomment-2008072396 @dongjoon-hyun I can certainly implement your proposal if the consensus among reviewers is that we'd like to go that direction. The downside I can see in what you suggest is that it

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
mihailom-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1531040306 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -994,8 +994,12 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-19 Thread via GitHub
mihailom-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1530992551 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala: ## @@ -215,6 +220,10 @@ object AnsiTypeCoercion extends

Re: [PR] [SPARK-47447][SQL] Allow reading Parquet TimestampLTZ as TimestampNTZ [spark]

2024-03-19 Thread via GitHub
gengliangwang commented on code in PR #45571: URL: https://github.com/apache/spark/pull/45571#discussion_r1530949174 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala: ## @@ -436,6 +436,17 @@ private[parquet] class

Re: [PR] [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` [spark]

2024-03-19 Thread via GitHub
tgravescs commented on PR #45587: URL: https://github.com/apache/spark/pull/45587#issuecomment-2007935739 well it definitely changes the synchronization, everywhere else that is referenced is in synchronized block. Now since its only testing I'm not sure if you will see any issues there.

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
jingz-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530925161 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45503: URL: https://github.com/apache/spark/pull/45503#discussion_r1530919219 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -192,6 +204,234 @@ class PrefixKeyScanStateEncoder(

Re: [PR] [SPARK-47372][SS] Add support for range scan based key state encoder for use with state store provider [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45503: URL: https://github.com/apache/spark/pull/45503#discussion_r1530915028 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TimerStateImpl.scala: ## @@ -85,13 +85,13 @@ class TimerStateImpl( } val keyToTsCFName

Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

2024-03-19 Thread via GitHub
viirya commented on PR #45505: URL: https://github.com/apache/spark/pull/45505#issuecomment-2007883753 Thank you @parthchandra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47449] Refactor and split list/timer unit tests [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45573: URL: https://github.com/apache/spark/pull/45573#discussion_r1530900057 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/ListStateSuite.scala: ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530889290 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -341,8 +475,67 @@ case class TransformWithStateExec(

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530881452 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -271,57 +340,122 @@ case class TransformWithStateExec(

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530885800 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -127,13 +152,53 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

2024-03-19 Thread via GitHub
parthchandra commented on PR #45505: URL: https://github.com/apache/spark/pull/45505#issuecomment-2007857003 Closing this PR. As @viirya pointed out, it is possible to achieve the update to CustomTaskMetric without a new interface -- This is an automated message from the Apache Git

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530883632 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -127,13 +152,53 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

2024-03-19 Thread via GitHub
parthchandra closed pull request #45505: [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources URL: https://github.com/apache/spark/pull/45505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530882509 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -271,57 +340,122 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun commented on PR #45587: URL: https://github.com/apache/spark/pull/45587#issuecomment-2007854448 Merged to master for Apache Spark 4.0.0. Thank you, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530881452 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -271,57 +340,122 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` [spark]

2024-03-19 Thread via GitHub
dongjoon-hyun closed pull request #45587: [SPARK-47461][CORE] Remove private function `totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager` URL: https://github.com/apache/spark/pull/45587 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530878089 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -127,13 +152,53 @@ case class TransformWithStateExec(

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530876659 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -85,23 +94,39 @@ case class TransformWithStateExec( }

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530876311 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -85,23 +94,39 @@ case class TransformWithStateExec( }

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530875269 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530873761 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-19 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1530874118 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -0,0 +1,286 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47454][PYTHON][CONNECT][TESTS][FOLLOWUP] Further split `pyspark.sql.tests.test_dataframe` [spark]

2024-03-19 Thread via GitHub
xinrong-meng commented on PR #45591: URL: https://github.com/apache/spark/pull/45591#issuecomment-2007827514 LGTM thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-19 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1530840650 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

[PR] [WIP] Allow trailing comma in column definition list [spark]

2024-03-19 Thread via GitHub
stefankandic opened a new pull request, #45593: URL: https://github.com/apache/spark/pull/45593 ### What changes were proposed in this pull request? When experimenting with SQL queries, it's common to accidentally leave a trailing comma after the last column in a table

  1   2   >