Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-03-13 Thread via GitHub
MaxGekk commented on PR #44902: URL: https://github.com/apache/spark/pull/44902#issuecomment-1996583071 > Or would you like to break that into its own PR? Let's do that in a separate PR because I expect massive changes. It should be easier to review only one kind of changes. --

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

2024-03-13 Thread via GitHub
ahshahid commented on code in PR #45446: URL: https://github.com/apache/spark/pull/45446#discussion_r1524263718 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -477,6 +482,57 @@ trait ColumnResolutionHelper extends

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

2024-03-13 Thread via GitHub
ahshahid commented on code in PR #45446: URL: https://github.com/apache/spark/pull/45446#discussion_r1524263718 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -477,6 +482,57 @@ trait ColumnResolutionHelper extends

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-13 Thread via GitHub
HeartSaVioR commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1524250080 ## sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala: ## @@ -676,6 +676,42 @@ class KeyValueGroupedDataset[K, V] private[sql]( ) }

Re: [PR] [SPARK-47375][Doc][FollowUp] Correct the preferTimestampNTZ option description in JDBC doc [spark]

2024-03-13 Thread via GitHub
gengliangwang closed pull request #45502: [SPARK-47375][Doc][FollowUp] Correct the preferTimestampNTZ option description in JDBC doc URL: https://github.com/apache/spark/pull/45502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47375][Doc][FollowUp] Correct the preferTimestampNTZ option description in JDBC doc [spark]

2024-03-13 Thread via GitHub
gengliangwang commented on PR #45502: URL: https://github.com/apache/spark/pull/45502#issuecomment-1996354375 Thanks, merging to master/3.5/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [WIP] Remove unused error classes [spark]

2024-03-13 Thread via GitHub
panbingkun opened a new pull request, #45509: URL: https://github.com/apache/spark/pull/45509 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [SPARK-47385] Fix tuple encoders with Option inputs. [spark]

2024-03-13 Thread via GitHub
chenhao-db opened a new pull request, #45508: URL: https://github.com/apache/spark/pull/45508 ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/40755 adds a null check on the input of the child deserializer in the tuple encoder. It breaks the

Re: [PR] [SPARK-47007][SQL][PYTHON][R][CONNECT] MapSort function [spark]

2024-03-13 Thread via GitHub
HyukjinKwon commented on PR #45069: URL: https://github.com/apache/spark/pull/45069#issuecomment-1996239586 cc @cloud-fan too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47007][SQL][PYTHON][R][CONNECT] MapSort function [spark]

2024-03-13 Thread via GitHub
HyukjinKwon commented on code in PR #45069: URL: https://github.com/apache/spark/pull/45069#discussion_r1524101115 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -6986,6 +6986,24 @@ object functions { @scala.annotation.varargs def map_concat(cols:

Re: [PR] [SPARK-47007][SQL][PYTHON][R][CONNECT] MapSort function [spark]

2024-03-13 Thread via GitHub
HyukjinKwon commented on code in PR #45069: URL: https://github.com/apache/spark/pull/45069#discussion_r1524100645 ## python/pyspark/sql/functions/builtin.py: ## @@ -16840,6 +16840,54 @@ def map_concat( return _invoke_function_over_seq_of_columns("map_concat", cols) #

Re: [PR] [SPARK-47384][BUILD] Upgrade RoaringBitmap to 1.0.5 [spark]

2024-03-13 Thread via GitHub
panbingkun commented on PR #45507: URL: https://github.com/apache/spark/pull/45507#issuecomment-1996210009 Let it run through GA first, and then I will add the benchmark(`org.apache.spark.MapStatusesConvertBenchmark`) results. -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-03-13 Thread via GitHub
nchammas commented on PR #44902: URL: https://github.com/apache/spark/pull/44902#issuecomment-1996210393 @MaxGekk - Given the recent discussion on SPARK-46810, shall I expand this PR to include renaming `errorClass` across the codebase? Or would you like to break that into its own PR? --

Re: [PR] [WIP][SPARK-47338][SQL] Introduce `_LEGACY_ERROR_UNKNOWN` for default error class [spark]

2024-03-13 Thread via GitHub
itholic commented on code in PR #45457: URL: https://github.com/apache/spark/pull/45457#discussion_r1524079365 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -7902,6 +7902,11 @@ "Doesn't support month or year interval: " ] }, +

[PR] [SPARK-47384][BUILD] Upgrade RoaringBitmap to 1.0.5 [spark]

2024-03-13 Thread via GitHub
panbingkun opened a new pull request, #45507: URL: https://github.com/apache/spark/pull/45507 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-23015][WINDOWS] Mitigate bug in Windows where starting multiple Spark instances within the same second causes a failure [spark]

2024-03-13 Thread via GitHub
panbingkun commented on PR #43706: URL: https://github.com/apache/spark/pull/43706#issuecomment-1996194002 I'm very sorry, I lost it. I will verify it on a Windows machine this weekend. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-45827] Refactor supportsDataType [spark]

2024-03-13 Thread via GitHub
cloud-fan commented on code in PR #45506: URL: https://github.com/apache/spark/pull/45506#discussion_r1524065273 ## sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala: ## @@ -190,10 +210,7 @@ trait CreatableRelationProvider { case MapType(k, v, _) =>

Re: [PR] [SPARK-23015][WINDOWS] Mitigate bug in Windows where starting multiple Spark instances within the same second causes a failure [spark]

2024-03-13 Thread via GitHub
github-actions[bot] commented on PR #43706: URL: https://github.com/apache/spark/pull/43706#issuecomment-1996172425 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[PR] [SPARK-45827] Refactor supportsDataType [spark]

2024-03-13 Thread via GitHub
cashmand opened a new pull request, #45506: URL: https://github.com/apache/spark/pull/45506 ### What changes were proposed in this pull request? This is a follow-up to https://github.com/apache/spark/pull/45409. It refactors `supportsDataType` to use a non-recursive

Re: [PR] [SPARK-41762][PYTHON][CONNECT][TESTS] Enable column name comparsion in `test_column_arithmetic_ops` [spark]

2024-03-13 Thread via GitHub
zhengruifeng commented on PR #45493: URL: https://github.com/apache/spark/pull/45493#issuecomment-1996169050 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-41762][PYTHON][CONNECT][TESTS] Enable column name comparsion in `test_column_arithmetic_ops` [spark]

2024-03-13 Thread via GitHub
zhengruifeng closed pull request #45493: [SPARK-41762][PYTHON][CONNECT][TESTS] Enable column name comparsion in `test_column_arithmetic_ops` URL: https://github.com/apache/spark/pull/45493 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

2024-03-13 Thread via GitHub
ahshahid commented on code in PR #45446: URL: https://github.com/apache/spark/pull/45446#discussion_r1524029007 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -477,6 +482,57 @@ trait ColumnResolutionHelper extends

Re: [PR] [SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-03-13 Thread via GitHub
sahnib commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1524024974 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala: ## @@ -72,3 +74,32 @@ case class EventTimeWatermark( override

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-13 Thread via GitHub
wbo4958 commented on PR #45232: URL: https://github.com/apache/spark/pull/45232#issuecomment-1996052736 Hi @grundprinzip, I would be grateful if you could kindly take another look at this PR, Thx. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-13 Thread via GitHub
ahshahid commented on PR #45343: URL: https://github.com/apache/spark/pull/45343#issuecomment-1996044845 The PR #45446 handles the issue comprehensively and this PR was a subset of the changes contained in PR #45446 , so closing this PR -- This is an automated message from the Apache Git

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-13 Thread via GitHub
ahshahid closed pull request #45343: [SPARK-47217][SQL] Fix ambiguity check in self joins URL: https://github.com/apache/spark/pull/45343 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

2024-03-13 Thread via GitHub
parthchandra opened a new pull request, #45505: URL: https://github.com/apache/spark/pull/45505 ### What changes were proposed in this pull request? Provides a new interface `CustomFileTaskMetric` that extends the `CustomTaskMetric` and allows updating of values. ### Why are the

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-13 Thread via GitHub
ueshin commented on PR #45468: URL: https://github.com/apache/spark/pull/45468#issuecomment-1995996043 @allisonwang-db Seems like the new test is still failing: ``` [info] - SPARK-47346: cannot create Python worker with different useDaemon flag *** FAILED *** (34 milliseconds)

[PR] [SPARK-47383][CORE] Make the shutdown hook timeout configurable [spark]

2024-03-13 Thread via GitHub
robreeves opened a new pull request, #45504: URL: https://github.com/apache/spark/pull/45504 ### What changes were proposed in this pull request? Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-13 Thread via GitHub
HeartSaVioR closed pull request #45051: [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator URL: https://github.com/apache/spark/pull/45051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-13 Thread via GitHub
HeartSaVioR commented on PR #45051: URL: https://github.com/apache/spark/pull/45051#issuecomment-1995685935 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47344] Extend INVALID_IDENTIFIER error beyond catching '-' in an unquoted identifier and fix "IS ! NULL" et al. [spark]

2024-03-13 Thread via GitHub
gengliangwang closed pull request #45470: [SPARK-47344] Extend INVALID_IDENTIFIER error beyond catching '-' in an unquoted identifier and fix "IS ! NULL" et al. URL: https://github.com/apache/spark/pull/45470 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47344] Extend INVALID_IDENTIFIER error beyond catching '-' in an unquoted identifier and fix "IS ! NULL" et al. [spark]

2024-03-13 Thread via GitHub
gengliangwang commented on PR #45470: URL: https://github.com/apache/spark/pull/45470#issuecomment-1995659868 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47372] Add support for range scan based key encoder for stateful streaming using state provider [spark]

2024-03-13 Thread via GitHub
anishshri-db opened a new pull request, #45503: URL: https://github.com/apache/spark/pull/45503 ### What changes were proposed in this pull request? Add support for range scan based key encoder for stateful streaming using state provider ### Why are the changes needed?

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

2024-03-13 Thread via GitHub
ahshahid commented on code in PR #45446: URL: https://github.com/apache/spark/pull/45446#discussion_r1523844585 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -477,6 +482,57 @@ trait ColumnResolutionHelper extends

Re: [PR] [SPARK-47273][SS][PYTHON] implement Python data stream writer interface. [spark]

2024-03-13 Thread via GitHub
chaoqin-li1123 commented on PR #45305: URL: https://github.com/apache/spark/pull/45305#issuecomment-1995481769 PTAL,Thanks! @HeartSaVioR @HyukjinKwon @allisonwang-db @sahnib -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47367][PYTHON][CONNECT] Support Python data sources with Spark Connect [spark]

2024-03-13 Thread via GitHub
ueshin commented on code in PR #45486: URL: https://github.com/apache/spark/pull/45486#discussion_r1523802659 ## python/pyspark/sql/tests/connect/test_parity_python_datasource.py: ## Review Comment: This module needs to be listed in `dev/sparktestsupport/modules.py`.

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-13 Thread via GitHub
jingz-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1523797687 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -0,0 +1,268 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47375][Doc][FollowUp] Correct the preferTimestampNTZ option description in JDBC doc [spark]

2024-03-13 Thread via GitHub
gengliangwang commented on PR #45502: URL: https://github.com/apache/spark/pull/45502#issuecomment-1995432119 cc @sadikovi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-47375][Doc][FollowUp] Correct the preferTimestampNTZ option description in JDBC doc [spark]

2024-03-13 Thread via GitHub
gengliangwang opened a new pull request, #45502: URL: https://github.com/apache/spark/pull/45502 ### What changes were proposed in this pull request? Correct the preferTimestampNTZ option description in JDBC doc as per https://github.com/apache/spark/pull/45496 ### Why

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

2024-03-13 Thread via GitHub
ahshahid commented on code in PR #45446: URL: https://github.com/apache/spark/pull/45446#discussion_r1523727676 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -477,6 +482,57 @@ trait ColumnResolutionHelper extends

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-13 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1523710658 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -829,20 +846,59 @@ case class KeyGroupedShuffleSpec( }

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-13 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1523707057 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -829,20 +846,59 @@ case class KeyGroupedShuffleSpec( }

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-13 Thread via GitHub
szehon-ho commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1523706702 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47343][SQL] Fix NPE when `sqlString` variable value is null string in execute immediate [spark]

2024-03-13 Thread via GitHub
srielau commented on code in PR #45462: URL: https://github.com/apache/spark/pull/45462#discussion_r1523678372 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -3004,6 +3004,12 @@ ], "sqlState" : "2200E" }, + "NULL_QUERY_STRING_EXECUTE_IMMEDIATE"

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-13 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1523675926 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -0,0 +1,268 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-13 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1523675926 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## @@ -0,0 +1,268 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-13 Thread via GitHub
anishshri-db commented on PR #45467: URL: https://github.com/apache/spark/pull/45467#issuecomment-1995127603 Error seems relevant on the MIMA checks - ``` problems with Sql module: method

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-13 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1523670747 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -64,31 +68,52 @@ case class TransformWithStateExec(

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-13 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1523668147 ## sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala: ## @@ -665,7 +665,8 @@ class KeyValueGroupedDataset[K, V] private[sql](

Re: [PR] [SS][SPARK-47363] Initial State without state reader implementation for State API v2. [spark]

2024-03-13 Thread via GitHub
anishshri-db commented on code in PR #45467: URL: https://github.com/apache/spark/pull/45467#discussion_r1523663013 ## sql/api/src/main/scala/org/apache/spark/sql/streaming/StatefulProcessor.scala: ## @@ -85,3 +85,21 @@ private[sql] trait StatefulProcessor[K, I, O] extends

Re: [PR] [SPARK-47289][SQL] Allow extensions to log extended information in explain plan [spark]

2024-03-13 Thread via GitHub
parthchandra commented on PR #45488: URL: https://github.com/apache/spark/pull/45488#issuecomment-1995049132 @dongjoon-hyun github actions are enabled in my repository and the branch is based on the latest commit in master. In my repo the ci checks are shown as passing.

Re: [PR] [SPARK-47378][PROTOBUF][TESTS] Make the related Protobuf UT run well in IDE [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on PR #45498: URL: https://github.com/apache/spark/pull/45498#issuecomment-1994988778 Thank you, @panbingkun . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47378][PROTOBUF][TESTS] Make the related Protobuf UT run well in IDE [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun closed pull request #45498: [SPARK-47378][PROTOBUF][TESTS] Make the related Protobuf UT run well in IDE URL: https://github.com/apache/spark/pull/45498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on PR #45500: URL: https://github.com/apache/spark/pull/45500#issuecomment-1994982261 Thank you for sharing your AS-IS status, @HiuKwok . For the following, it's a little surprising to me. > However in this case we no longer have to control URL rewrite,

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #45500: URL: https://github.com/apache/spark/pull/45500#discussion_r1523613261 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java: ## @@ -92,9 +92,9 @@ protected void initializeServer() {

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #45500: URL: https://github.com/apache/spark/pull/45500#discussion_r1523611719 ## core/src/test/scala/org/apache/spark/ui/UISuite.scala: ## @@ -398,8 +386,8 @@ class UISuite extends SparkFunSuite { } } - test("SPARK-45522: Jetty

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #45500: URL: https://github.com/apache/spark/pull/45500#discussion_r1523608885 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -2198,8 +2197,6 @@ private[spark] object Utils return true }

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #45500: URL: https://github.com/apache/spark/pull/45500#discussion_r1523607793 ## core/src/main/scala/org/apache/spark/ui/JettyUtils.scala: ## @@ -588,36 +590,37 @@ private[spark] case class ServerInfo( * a servlet context without the

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #45500: URL: https://github.com/apache/spark/pull/45500#discussion_r1523607249 ## core/src/main/scala/org/apache/spark/ui/JettyUtils.scala: ## @@ -579,6 +575,12 @@ private[spark] case class ServerInfo( } +// private def

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #45500: URL: https://github.com/apache/spark/pull/45500#discussion_r1523604616 ## core/src/main/scala/org/apache/spark/ui/JettyUtils.scala: ## @@ -209,12 +213,12 @@ private[spark] object JettyUtils extends Logging { override def

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #45500: URL: https://github.com/apache/spark/pull/45500#discussion_r1523605466 ## core/src/main/scala/org/apache/spark/ui/JettyUtils.scala: ## @@ -209,12 +213,12 @@ private[spark] object JettyUtils extends Logging { override def

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #45500: URL: https://github.com/apache/spark/pull/45500#discussion_r1523602207 ## core/src/main/scala/org/apache/spark/TestUtils.scala: ## @@ -335,9 +336,9 @@ private[spark] object TestUtils extends SparkTestUtils { // 0 as port means

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #45500: URL: https://github.com/apache/spark/pull/45500#discussion_r1523603371 ## core/src/main/scala/org/apache/spark/ui/JettyUtils.scala: ## @@ -149,6 +152,7 @@ private[spark] object JettyUtils extends Logging { // Make sure we

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #45500: URL: https://github.com/apache/spark/pull/45500#discussion_r1523601523 ## core/src/main/scala/org/apache/spark/TestUtils.scala: ## @@ -46,15 +46,16 @@ import org.apache.logging.log4j.core.config.builder.api.ConfigurationBuilderFact

Re: [PR] [SPARK-47342][SQL] Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE [spark]

2024-03-13 Thread via GitHub
dongjoon-hyun commented on PR #45471: URL: https://github.com/apache/spark/pull/45471#issuecomment-1994934200 Thank you for reverting, @yaooqinn and all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-47327][SQL] Move sort keys concurrency test to CollationFactorySuite [spark]

2024-03-13 Thread via GitHub
stefankandic opened a new pull request, #45501: URL: https://github.com/apache/spark/pull/45501 ### What changes were proposed in this pull request? Move concurrency test to the `CollationFactorySuite` ### Why are the changes needed? This is more appropriate

Re: [PR] [SPARK-47344] Extend INVALID_IDENTIFIER error beyond catching '-' in an unquoted identifier and fix "IS ! NULL" et al. [spark]

2024-03-13 Thread via GitHub
srielau commented on code in PR #45470: URL: https://github.com/apache/spark/pull/45470#discussion_r1523561534 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala: ## @@ -39,6 +39,23 @@ class ErrorParserSuite extends AnalysisTest {

Re: [PR] [SPARK-47344] Extend INVALID_IDENTIFIER error beyond catching '-' in an unquoted identifier and fix "IS ! NULL" et al. [spark]

2024-03-13 Thread via GitHub
MaxGekk commented on code in PR #45470: URL: https://github.com/apache/spark/pull/45470#discussion_r1523544102 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala: ## @@ -39,6 +39,23 @@ class ErrorParserSuite extends AnalysisTest {

Re: [PR] [SPARK-47344] Extend INVALID_IDENTIFIER error beyond catching '-' in an unquoted identifier and fix "IS ! NULL" et al. [spark]

2024-03-13 Thread via GitHub
MaxGekk commented on code in PR #45470: URL: https://github.com/apache/spark/pull/45470#discussion_r1523544839 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala: ## @@ -39,6 +39,23 @@ class ErrorParserSuite extends AnalysisTest {

Re: [PR] [SPARK-47208][CORE] Allow overriding base overhead memory [spark]

2024-03-13 Thread via GitHub
tgravescs commented on PR #45240: URL: https://github.com/apache/spark/pull/45240#issuecomment-1994608914 Looks good, thanks @jpcorreia99 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47210][SQL][COLLATION] Implicit casting on collated expressions [spark]

2024-03-13 Thread via GitHub
dbatomic commented on PR #45383: URL: https://github.com/apache/spark/pull/45383#issuecomment-1994569802 > Some high-level questions: > > 1. If a function requires certain collations but the input uses a different collation, shall we implicitly cast or fail? > 2. If a function's

Re: [PR] [SPARK-47208][CORE] Allow overriding base overhead memory [spark]

2024-03-13 Thread via GitHub
jpcorreia99 commented on PR #45240: URL: https://github.com/apache/spark/pull/45240#issuecomment-1994561966 @tgravescs thank you for the comments, I've addressed them in cd03ec88bf965622c4fad3e60dc76b5a6bd78e5d -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47344] Extend INVALID_IDENTIFIER error beyond catching '-' in an unquoted identifier and fix "IS ! NULL" et al. [spark]

2024-03-13 Thread via GitHub
srielau commented on code in PR #45470: URL: https://github.com/apache/spark/pull/45470#discussion_r1523389357 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala: ## @@ -39,6 +39,23 @@ class ErrorParserSuite extends AnalysisTest {

Re: [PR] [SPARK-47208][CORE] Allow overriding base overhead memory [spark]

2024-03-13 Thread via GitHub
tgravescs commented on code in PR #45240: URL: https://github.com/apache/spark/pull/45240#discussion_r1523337134 ## core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala: ## @@ -489,10 +489,11 @@ object ResourceProfile extends Logging { private[spark] def

Re: [PR] [SPARK-47208][CORE] Allow overriding base overhead memory [spark]

2024-03-13 Thread via GitHub
tgravescs commented on code in PR #45240: URL: https://github.com/apache/spark/pull/45240#discussion_r1523328549 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala: ## @@ -706,6 +706,39 @@ class ClientSuite extends SparkFunSuite

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
HiuKwok commented on PR #45500: URL: https://github.com/apache/spark/pull/45500#issuecomment-1994496507 @dongjoon-hyun @LuciferYang During the past few weeks, I managed to re-write / update, all Jetty-related classes, things look fine in most of the Java / Scala classes.

Re: [PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
HiuKwok commented on PR #45500: URL: https://github.com/apache/spark/pull/45500#issuecomment-1994466598 This is a draft MR and I'm still working on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-47086][BUILD][CORE][SQL][UI] Migrate from Jetty 11 to Jetty 12 [spark]

2024-03-13 Thread via GitHub
HiuKwok opened a new pull request, #45500: URL: https://github.com/apache/spark/pull/45500 ### What changes were proposed in this pull request? This is a draft MR to upgrade Jetty deps from 11 to 12. ### Why are the changes needed? ### Does this PR

Re: [PR] [SPARK-47344] Extend INVALID_IDENTIFIER error beyond catching '-' in an unquoted identifier and fix "IS ! NULL" et al. [spark]

2024-03-13 Thread via GitHub
MaxGekk commented on code in PR #45470: URL: https://github.com/apache/spark/pull/45470#discussion_r1523276788 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala: ## @@ -39,6 +39,23 @@ class ErrorParserSuite extends AnalysisTest {

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-13 Thread via GitHub
uros-db commented on PR #45422: URL: https://github.com/apache/spark/pull/45422#issuecomment-1994394143 @cloud-fan that makes a lot of sense, to combat this - now new case classes should handle this. essentially: - `StringType` no longer accepts all collationIds, but only the default

[PR] [SPARK-47380][CONNECT] Ensure on the server side that the SparkSession is the same [spark]

2024-03-13 Thread via GitHub
nemanja-boric-databricks opened a new pull request, #45499: URL: https://github.com/apache/spark/pull/45499 ### What changes were proposed in this pull request? In this PR we change the client behaviour to send the previously observed server session id so that the

Re: [PR] [SPARK-47375][SQL] Add guidelines for timestamp mapping in `JdbcDialect#getCatalystType` [spark]

2024-03-13 Thread via GitHub
yaooqinn commented on PR #45496: URL: https://github.com/apache/spark/pull/45496#issuecomment-1994278172 Merged to master. Thank you @MaxGekk @cloud-fan for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47375][SQL] Add guidelines for timestamp mapping in `JdbcDialect#getCatalystType` [spark]

2024-03-13 Thread via GitHub
yaooqinn closed pull request #45496: [SPARK-47375][SQL] Add guidelines for timestamp mapping in `JdbcDialect#getCatalystType` URL: https://github.com/apache/spark/pull/45496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-13 Thread via GitHub
MaxGekk closed pull request #44665: [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data URL: https://github.com/apache/spark/pull/44665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-13 Thread via GitHub
MaxGekk commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-1994249547 +1, LGTM. Merging to master. Thank you, @panbingkun and @HyukjinKwon @LuciferYang @srowen for review. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47378][PROTOBUF][TESTS] Make the related Protobuf UT run well in IDE [spark]

2024-03-13 Thread via GitHub
panbingkun commented on PR #45498: URL: https://github.com/apache/spark/pull/45498#issuecomment-1994200416 I also verified using `./build/sbt` to run the related Protobuf `UT` locally, and it also work well, eg:

[PR] [SPARK-47378][PROTOBUF][TESTS] Make related Protobuf UT run well in IDE [spark]

2024-03-13 Thread via GitHub
panbingkun opened a new pull request, #45498: URL: https://github.com/apache/spark/pull/45498 ### What changes were proposed in this pull request? The pr aims to make related Protobuf `UT` run well in IDE. ### Why are the changes needed? Before:

[PR] [SPARK-47377][PYTHON][CONNECT][TESTS] Factor out tests from `SparkConnectSQLTestCase` [spark]

2024-03-13 Thread via GitHub
zhengruifeng opened a new pull request, #45497: URL: https://github.com/apache/spark/pull/45497 ### What changes were proposed in this pull request? Factor out tests from `SparkConnectSQLTestCase` ### Why are the changes needed? for testing parallelism ### Does this

Re: [PR] [WIP][SPARK-47338][SQL] Introduce `_LEGACY_ERROR_UNKNOWN` for default error class [spark]

2024-03-13 Thread via GitHub
MaxGekk commented on code in PR #45457: URL: https://github.com/apache/spark/pull/45457#discussion_r1523001153 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -7902,6 +7902,11 @@ "Doesn't support month or year interval: " ] }, +

Re: [PR] [WIP][SPARK-47338][SQL] Introduce `_LEGACY_ERROR_UNKNOWN` for default error class [spark]

2024-03-13 Thread via GitHub
MaxGekk commented on code in PR #45457: URL: https://github.com/apache/spark/pull/45457#discussion_r1522991723 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -7902,6 +7902,11 @@ "Doesn't support month or year interval: " ] }, +

Re: [PR] [SPARK-47329][SS] Persist dataframe while using foreachbatch and stateful streaming query to prevent state from being re-loaded in each batch [spark]

2024-03-13 Thread via GitHub
Bobstar55 commented on code in PR #45432: URL: https://github.com/apache/spark/pull/45432#discussion_r1522898039 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ForeachBatchSink.scala: ## @@ -22,19 +22,41 @@ import scala.util.control.NonFatal import

Re: [PR] [SPARK-47208][CORE] Allow overriding base overhead memory [spark]

2024-03-13 Thread via GitHub
jpcorreia99 commented on code in PR #45240: URL: https://github.com/apache/spark/pull/45240#discussion_r1522896795 ## core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala: ## @@ -489,10 +489,11 @@ object ResourceProfile extends Logging { private[spark] def

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

2024-03-13 Thread via GitHub
cloud-fan commented on code in PR #45446: URL: https://github.com/apache/spark/pull/45446#discussion_r1522868178 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -477,6 +482,57 @@ trait ColumnResolutionHelper extends

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

2024-03-13 Thread via GitHub
cloud-fan commented on code in PR #45446: URL: https://github.com/apache/spark/pull/45446#discussion_r1522868178 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -477,6 +482,57 @@ trait ColumnResolutionHelper extends

Re: [PR] [SPARK-47208][CORE] Allow overriding base overhead memory [spark]

2024-03-13 Thread via GitHub
jpcorreia99 commented on code in PR #45240: URL: https://github.com/apache/spark/pull/45240#discussion_r1522859546 ## core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala: ## @@ -489,10 +489,11 @@ object ResourceProfile extends Logging { private[spark] def

Re: [PR] [SPARK-47141] [Core]: Support shuffle migration to external storage. [spark]

2024-03-13 Thread via GitHub
abhishekd0907 commented on PR #45228: URL: https://github.com/apache/spark/pull/45228#issuecomment-1993902079 Hi @mridulm @attilapiros , Can you please help in reviewing this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47342][SQL] Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE [spark]

2024-03-13 Thread via GitHub
yaooqinn commented on PR #45471: URL: https://github.com/apache/spark/pull/45471#issuecomment-1993888126 This is reverted this by[e170252](https://github.com/apache/spark/commit/e170252714e3662c7354321f78a3250114ea7e9e). After an offline discussion with @cloud-fan, we have reached a

Re: [PR] [SPARK-47375][SQL] Add guidelines for timestamp mapping in `JdbcDialect#getCatalystType` [spark]

2024-03-13 Thread via GitHub
yaooqinn commented on PR #45496: URL: https://github.com/apache/spark/pull/45496#issuecomment-1993857925 Some built-in JDBC data sources need to distinguish the `timestamp tz` from ` timestamp ntz` as possible as we can instead of falling back to the default implementation -- This is an

[PR] [SPARK-47375][SQL] Add guidelines for timestamp mapping in `JdbcDialect#getCatalystType` [spark]

2024-03-13 Thread via GitHub
yaooqinn opened a new pull request, #45496: URL: https://github.com/apache/spark/pull/45496 ### What changes were proposed in this pull request? This PR adds guidelines for mapping database timestamps to Spark SQL Timestamps through the JDBC Standard API and Spark

  1   2   >