[GitHub] [spark] LuciferYang commented on a diff in pull request #38737: [SPARK-41174][CORE][SQL] Propagate an error class to users for invalid `format` of `to_binary()`

2022-11-22 Thread GitBox
LuciferYang commented on code in PR #38737: URL: https://github.com/apache/spark/pull/38737#discussion_r1030121284 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2620,46 +2620,81 @@ case class ToBinary(

[GitHub] [spark] dengziming commented on pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-22 Thread GitBox
dengziming commented on PR #38659: URL: https://github.com/apache/spark/pull/38659#issuecomment-1324659424 Thank you @grundprinzip for your review, I fixed the comments and let's wait for @hvanhovell and @cloud-fan. 欄 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang commented on pull request #38764: [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.

2022-11-22 Thread GitBox
LuciferYang commented on PR #38764: URL: https://github.com/apache/spark/pull/38764#issuecomment-1324658198 Thanks @HyukjinKwon @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #38737: [SPARK-41174][CORE][SQL] Propagate an error class to users for invalid `format` of `to_binary()`

2022-11-22 Thread GitBox
MaxGekk commented on code in PR #38737: URL: https://github.com/apache/spark/pull/38737#discussion_r1030091988 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2620,46 +2620,81 @@ case class ToBinary(

[GitHub] [spark] itholic commented on pull request #38766: [MINOR][SQL] Fix error message for `UNEXPECTED_INPUT_TYPE`

2022-11-22 Thread GitBox
itholic commented on PR #38766: URL: https://github.com/apache/spark/pull/38766#issuecomment-1324635151 I made it into a separate PR from my other tasks although it's very small change, to avoid in case this error message affecting the tests on the original PR. -- This is an automated

[GitHub] [spark] itholic opened a new pull request, #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-22 Thread GitBox
itholic opened a new pull request, #38769: URL: https://github.com/apache/spark/pull/38769 ### What changes were proposed in this pull request? This PR proposes to rename `COLUMN_NOT_IN_GROUP_BY_CLAUSE` to `MISSING_AGGREGATION`. Also, improve its error message. ### Why

[GitHub] [spark] amaliujia opened a new pull request, #38768: [SPARK-41230][CONNECT][PYTHON] Remove `str` from Aggregate expression type

2022-11-22 Thread GitBox
amaliujia opened a new pull request, #38768: URL: https://github.com/apache/spark/pull/38768 ### What changes were proposed in this pull request? This PR proposes that Relations (e.g. Aggregate in this PR) should only deal with `Expression` than `str`. `str` could be mapped

[GitHub] [spark] MaxGekk commented on a diff in pull request #38707: [SPARK-41176][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1042

2022-11-22 Thread GitBox
MaxGekk commented on code in PR #38707: URL: https://github.com/apache/spark/pull/38707#discussion_r1030084249 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -637,13 +637,13 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] cloud-fan commented on pull request #38767: [SPARK-41183][SQL][FOLLOWUP] Fix a typo

2022-11-22 Thread GitBox
cloud-fan commented on PR #38767: URL: https://github.com/apache/spark/pull/38767#issuecomment-1324625972 cc @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan opened a new pull request, #38767: [SPARK-41183][SQL][FOLLOWUP] Fix a typo

2022-11-22 Thread GitBox
cloud-fan opened a new pull request, #38767: URL: https://github.com/apache/spark/pull/38767 ### What changes were proposed in this pull request? Followup of https://github.com/apache/spark/pull/38692. To follow other APIs in `SparkSessionExtensions`, the name should be

[GitHub] [spark] itholic opened a new pull request, #38766: [MINOR][SQL] Fix error message for `UNEXPECTED_INPUT_TYPE`

2022-11-22 Thread GitBox
itholic opened a new pull request, #38766: URL: https://github.com/apache/spark/pull/38766 ### What changes were proposed in this pull request? This PR proposes to correct the minor syntax on error message for `UNEXPECTED_INPUT_TYPE`, ### Why are the changes needed?

[GitHub] [spark] LuciferYang commented on a diff in pull request #38737: [SPARK-41174][CORE][SQL] Propagate an error class to users for invalid `format` of `to_binary()`

2022-11-22 Thread GitBox
LuciferYang commented on code in PR #38737: URL: https://github.com/apache/spark/pull/38737#discussion_r1030082083 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2620,46 +2620,81 @@ case class ToBinary(

[GitHub] [spark] wankunde opened a new pull request, #38765: [SPARK-35531][SQL][FOLLOWUP] Support alter table command with CASE_SENSITIVE is true

2022-11-22 Thread GitBox
wankunde opened a new pull request, #38765: URL: https://github.com/apache/spark/pull/38765 ### What changes were proposed in this pull request? Restore dbName and tableName in `HiveShim.getTable()` method. When we create a hive table, hive will convert the dbName and

[GitHub] [spark] MaxGekk commented on pull request #38710: [SPARK-41179][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1092

2022-11-22 Thread GitBox
MaxGekk commented on PR #38710: URL: https://github.com/apache/spark/pull/38710#issuecomment-1324619657 @panbingkun Please, resolve conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #38760: [SPARK-41219][SQL] Decimal changePrecision should work with decimal(0, 0)

2022-11-22 Thread GitBox
cloud-fan commented on PR #38760: URL: https://github.com/apache/spark/pull/38760#issuecomment-1324619462 cc @srielau @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on a diff in pull request #38725: [SPARK-41182][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1102

2022-11-22 Thread GitBox
MaxGekk commented on code in PR #38725: URL: https://github.com/apache/spark/pull/38725#discussion_r1030079292 ## core/src/main/resources/error/error-classes.json: ## @@ -656,6 +656,11 @@ ], "sqlState" : "42000" }, + "INVALID_EXTRACT_FIELD" : { +"message" : [

[GitHub] [spark] MaxGekk commented on pull request #38730: [SPARK-41181][SQL] Migrate the map options errors onto error classes

2022-11-22 Thread GitBox
MaxGekk commented on PR #38730: URL: https://github.com/apache/spark/pull/38730#issuecomment-1324618149 @panbingkun Could you resolve conflicts, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on pull request #38760: [SPARK-41219][SQL] Decimal changePrecision should work with decimal(0, 0)

2022-11-22 Thread GitBox
cloud-fan commented on PR #38760: URL: https://github.com/apache/spark/pull/38760#issuecomment-1324618007 It seems reasonable to say that 0 is the only valid value for `decimal(0, 0)`. Forbidding `decimal(0, 0)` seems also reasonable but is more risky. -- This is an automated message

[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-11-22 Thread GitBox
AngersZh commented on PR #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1324617674 gentle ping @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #38737: [SPARK-41174][CORE][SQL] Propagate an error class to users for invalid `format` of `to_binary()`

2022-11-22 Thread GitBox
MaxGekk commented on code in PR #38737: URL: https://github.com/apache/spark/pull/38737#discussion_r1030077573 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2620,46 +2620,81 @@ case class ToBinary(

[GitHub] [spark] MaxGekk closed pull request #38764: [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13

2022-11-22 Thread GitBox
MaxGekk closed pull request #38764: [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13 URL: https://github.com/apache/spark/pull/38764 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] MaxGekk commented on pull request #38764: [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13

2022-11-22 Thread GitBox
MaxGekk commented on PR #38764: URL: https://github.com/apache/spark/pull/38764#issuecomment-1324599102 +1, LGTM. Merging to master. All GAs passed. Thank you, @LuciferYang and @HyukjinKwon for review. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] AmplabJenkins commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-22 Thread GitBox
AmplabJenkins commented on PR #38750: URL: https://github.com/apache/spark/pull/38750#issuecomment-1324584432 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38751: [SPARK-40872][3.3] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-22 Thread GitBox
AmplabJenkins commented on PR #38751: URL: https://github.com/apache/spark/pull/38751#issuecomment-1324584414 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-22 Thread GitBox
MaxGekk commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1030039032 ## sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala: ## @@ -964,17 +964,14 @@ class SubquerySuite extends QueryTest | WHERE

[GitHub] [spark] MaxGekk closed pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-22 Thread GitBox
MaxGekk closed pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND URL: https://github.com/apache/spark/pull/38575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-22 Thread GitBox
MaxGekk commented on PR #38575: URL: https://github.com/apache/spark/pull/38575#issuecomment-1324575522 +1, LGTM. Merging to master. Thank you, @itholic and @HyukjinKwon @cloud-fan @LuciferYang for review. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #25004: [SPARK-28205][SQL] useV1SourceList configuration should be for all data sources

2022-11-22 Thread GitBox
MaxGekk commented on code in PR #25004: URL: https://github.com/apache/spark/pull/25004#discussion_r1030034252 ## sql/core/src/test/scala/org/apache/spark/sql/sources/v2/FileDataSourceV2FallBackSuite.scala: ## @@ -170,4 +174,46 @@ class FileDataSourceV2FallBackSuite extends

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38757: [SPARK-41222][CONNECT][PYTHON] Unify the typing definitions

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38757: URL: https://github.com/apache/spark/pull/38757#discussion_r1030022747 ## python/pyspark/sql/connect/column.py: ## @@ -15,14 +15,15 @@ # limitations under the License. # import uuid -from typing import cast, get_args,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38757: [SPARK-41222][CONNECT][PYTHON] Unify the typing definitions

2022-11-22 Thread GitBox
HyukjinKwon commented on code in PR #38757: URL: https://github.com/apache/spark/pull/38757#discussion_r1030022065 ## python/pyspark/sql/connect/column.py: ## @@ -15,14 +15,15 @@ # limitations under the License. # import uuid -from typing import cast, get_args,

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38757: [SPARK-41222][CONNECT][PYTHON] Unify the typing definitions

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38757: URL: https://github.com/apache/spark/pull/38757#discussion_r1030020312 ## python/pyspark/sql/connect/column.py: ## @@ -15,14 +15,15 @@ # limitations under the License. # import uuid -from typing import cast, get_args,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-22 Thread GitBox
HyukjinKwon commented on code in PR #38575: URL: https://github.com/apache/spark/pull/38575#discussion_r1030018762 ## R/pkg/tests/fulltests/test_sparkSQL.R: ## @@ -3990,12 +3990,16 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume

[GitHub] [spark] ulysses-you commented on a diff in pull request #38760: [SPARK-41219][SQL] Decimal changePrecision should work with decimal(0, 0)

2022-11-22 Thread GitBox
ulysses-you commented on code in PR #38760: URL: https://github.com/apache/spark/pull/38760#discussion_r1030015079 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3537,6 +3537,12 @@ class DataFrameSuite extends QueryTest }.isEmpty) }

[GitHub] [spark] amaliujia commented on a diff in pull request #38762: [SPARK-41225] [CONNECT] [PYTHON] Disable unsupported functions.

2022-11-22 Thread GitBox
amaliujia commented on code in PR #38762: URL: https://github.com/apache/spark/pull/38762#discussion_r1030011493 ## python/pyspark/sql/connect/dataframe.py: ## @@ -951,6 +951,39 @@ def createOrReplaceGlobalTempView(self, name: str) -> None:

[GitHub] [spark] amaliujia commented on a diff in pull request #38762: [SPARK-41225] [CONNECT] [PYTHON] Disable unsupported functions.

2022-11-22 Thread GitBox
amaliujia commented on code in PR #38762: URL: https://github.com/apache/spark/pull/38762#discussion_r1030011493 ## python/pyspark/sql/connect/dataframe.py: ## @@ -951,6 +951,39 @@ def createOrReplaceGlobalTempView(self, name: str) -> None:

[GitHub] [spark] amaliujia commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-22 Thread GitBox
amaliujia commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1030009940 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -271,8 +273,12 @@ class SparkConnectPlanner(session:

[GitHub] [spark] LuciferYang commented on a diff in pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-22 Thread GitBox
LuciferYang commented on code in PR #38685: URL: https://github.com/apache/spark/pull/38685#discussion_r1030009517 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -1759,24 +1763,25 @@ class DataFrameSuite extends QueryTest test("SPARK-8072:

[GitHub] [spark] LuciferYang commented on pull request #38764: [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.

2022-11-22 Thread GitBox
LuciferYang commented on PR #38764: URL: https://github.com/apache/spark/pull/38764#issuecomment-1324532345 cc @HyukjinKwon try to fix https://github.com/apache/spark/pull/38685#discussion_r1029966254

[GitHub] [spark] ahshahid commented on a diff in pull request #38714: [WIP][SPARK-41141]. avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-22 Thread GitBox
ahshahid commented on code in PR #38714: URL: https://github.com/apache/spark/pull/38714#discussion_r1030005359 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -208,20 +208,33 @@ object SubExprUtils extends PredicateHelper { */

[GitHub] [spark] ahshahid commented on a diff in pull request #38714: [WIP][SPARK-41141]. avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-22 Thread GitBox
ahshahid commented on code in PR #38714: URL: https://github.com/apache/spark/pull/38714#discussion_r1030005062 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -208,20 +208,33 @@ object SubExprUtils extends PredicateHelper { */

[GitHub] [spark] ahshahid commented on a diff in pull request #38714: [WIP][SPARK-41141]. avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-22 Thread GitBox
ahshahid commented on code in PR #38714: URL: https://github.com/apache/spark/pull/38714#discussion_r1030004911 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala: ## @@ -17,13 +17,20 @@ package

[GitHub] [spark] cloud-fan commented on a diff in pull request #38760: [SPARK-41219][SQL] Decimal changePrecision should work with decimal(0, 0)

2022-11-22 Thread GitBox
cloud-fan commented on code in PR #38760: URL: https://github.com/apache/spark/pull/38760#discussion_r1030003237 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3537,6 +3537,12 @@ class DataFrameSuite extends QueryTest }.isEmpty) } }

[GitHub] [spark] wankunde commented on a diff in pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-22 Thread GitBox
wankunde commented on code in PR #38560: URL: https://github.com/apache/spark/pull/38560#discussion_r1029992768 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -452,22 +489,69 @@ void

[GitHub] [spark] LuciferYang opened a new pull request, #38764: [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check with Scala 2.13

2022-11-22 Thread GitBox
LuciferYang opened a new pull request, #38764: URL: https://github.com/apache/spark/pull/38764 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] itholic commented on a diff in pull request #38646: [SPARK-41131][SQL] Improve error message for `UNRESOLVED_MAP_KEY.WITHOUT_SUGGESTION`

2022-11-22 Thread GitBox
itholic commented on code in PR #38646: URL: https://github.com/apache/spark/pull/38646#discussion_r102592 ## core/src/main/resources/error/error-classes.json: ## @@ -1044,7 +1044,7 @@ }, "UNRESOLVED_MAP_KEY" : { "message" : [ - "Cannot resolve column as a

[GitHub] [spark] cloud-fan commented on a diff in pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-22 Thread GitBox
cloud-fan commented on code in PR #38575: URL: https://github.com/apache/spark/pull/38575#discussion_r1029997623 ## R/pkg/tests/fulltests/test_sparkSQL.R: ## @@ -3990,12 +3990,16 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume

[GitHub] [spark] zhengruifeng commented on pull request #38763: [SPARK-41201][CONNECT][PYTHON][TEST][FOLLOWUP] Reenable test_fill_na

2022-11-22 Thread GitBox
zhengruifeng commented on PR #38763: URL: https://github.com/apache/spark/pull/38763#issuecomment-1324508234 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #38763: [SPARK-41201][CONNECT][PYTHON][TEST][FOLLOWUP] Reenable test_fill_na

2022-11-22 Thread GitBox
zhengruifeng closed pull request #38763: [SPARK-41201][CONNECT][PYTHON][TEST][FOLLOWUP] Reenable test_fill_na URL: https://github.com/apache/spark/pull/38763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] itholic commented on a diff in pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-22 Thread GitBox
itholic commented on code in PR #38575: URL: https://github.com/apache/spark/pull/38575#discussion_r1029988770 ## R/pkg/tests/fulltests/test_sparkSQL.R: ## @@ -3990,12 +3990,16 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1029986139 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -271,8 +273,12 @@ class SparkConnectPlanner(session:

[GitHub] [spark] xinrong-meng commented on pull request #38731: [SPARK-41209][PYTHON] Improve PySpark type inference in _merge_type method

2022-11-22 Thread GitBox
xinrong-meng commented on PR #38731: URL: https://github.com/apache/spark/pull/38731#issuecomment-1324490281 Thanks @sadikovi ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38762: [SPARK-41225] [CONNECT] [PYTHON] Disable unsupported functions.

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38762: URL: https://github.com/apache/spark/pull/38762#discussion_r1029983662 ## python/pyspark/sql/connect/dataframe.py: ## @@ -951,6 +951,39 @@ def createOrReplaceGlobalTempView(self, name: str) -> None:

[GitHub] [spark] xiuzhu9527 commented on pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2022-11-22 Thread GitBox
xiuzhu9527 commented on PR #38674: URL: https://github.com/apache/spark/pull/38674#issuecomment-1324484471 @tgravescs 1. Yes, Jersey 1 and Jersey 2 are two different packages, one is com.sun.jersey and one is org.glassfish.jersey 2. I will try to use maven-shade-plugin to change the

[GitHub] [spark] yabola commented on a diff in pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-22 Thread GitBox
yabola commented on code in PR #38560: URL: https://github.com/apache/spark/pull/38560#discussion_r1023816561 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -654,8 +731,7 @@ public MergeStatuses

[GitHub] [spark] LuciferYang commented on a diff in pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-22 Thread GitBox
LuciferYang commented on code in PR #38685: URL: https://github.com/apache/spark/pull/38685#discussion_r1029979036 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -1759,24 +1763,25 @@ class DataFrameSuite extends QueryTest test("SPARK-8072:

[GitHub] [spark] pan3793 closed pull request #38205: [SPARK-40747][CORE] Support setting driver log url using env vars on other resource managers

2022-11-22 Thread GitBox
pan3793 closed pull request #38205: [SPARK-40747][CORE] Support setting driver log url using env vars on other resource managers URL: https://github.com/apache/spark/pull/38205 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] pan3793 commented on pull request #38205: [SPARK-40747][CORE] Support setting driver log url using env vars on other resource managers

2022-11-22 Thread GitBox
pan3793 commented on PR #38205: URL: https://github.com/apache/spark/pull/38205#issuecomment-1324476298 Close and in favor https://github.com/apache/spark/pull/38357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] amaliujia commented on pull request #38763: [SPARK-41201][CONNECT][PYTHON][TEST][FOLLOWUP] Reenable test_fill_na

2022-11-22 Thread GitBox
amaliujia commented on PR #38763: URL: https://github.com/apache/spark/pull/38763#issuecomment-1324471937 @zhengruifeng thanks for the clarification! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1029975064 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,138 @@ message AnalyzePlanRequest { // logging purposes and will not be

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38762: [SPARK-41225] [CONNECT] [PYTHON] Disable unsupported functions.

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38762: URL: https://github.com/apache/spark/pull/38762#discussion_r1029974156 ## python/pyspark/sql/connect/dataframe.py: ## @@ -951,6 +951,39 @@ def createOrReplaceGlobalTempView(self, name: str) -> None:

[GitHub] [spark] beliefer commented on pull request #38745: [SPARK-37099][SQL] Optimize the filter based on rank-like window function by reduce not required rows

2022-11-22 Thread GitBox
beliefer commented on PR #38745: URL: https://github.com/apache/spark/pull/38745#issuecomment-1324464202 ping @zhengruifeng cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #38763: [SPARK-41201][CONNECT][PYTHON][TEST][FOLLOWUP] Reenable test_fill_na

2022-11-22 Thread GitBox
zhengruifeng commented on PR #38763: URL: https://github.com/apache/spark/pull/38763#issuecomment-1324459180 @amaliujia that is on purpose, `sdf.x` will just throw an exception since `sdf` don't contains `x` column, but in connect df `cdf` , `cdf.x` will not throw an exception since it

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-22 Thread GitBox
HyukjinKwon commented on code in PR #38685: URL: https://github.com/apache/spark/pull/38685#discussion_r1029966254 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -1759,24 +1763,25 @@ class DataFrameSuite extends QueryTest test("SPARK-8072:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-22 Thread GitBox
HyukjinKwon commented on code in PR #38685: URL: https://github.com/apache/spark/pull/38685#discussion_r1029966254 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -1759,24 +1763,25 @@ class DataFrameSuite extends QueryTest test("SPARK-8072:

[GitHub] [spark] bersprockets commented on pull request #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary`

2022-11-22 Thread GitBox
bersprockets commented on PR #38727: URL: https://github.com/apache/spark/pull/38727#issuecomment-1324453979 Tested PR https://github.com/apache/spark/pull/38737. That PR incidentally seems to fix this issue: ``` SELECT try_to_binary(col1, col2) from values ('abc', 'utf-8') as

[GitHub] [spark] bersprockets closed pull request #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary`

2022-11-22 Thread GitBox
bersprockets closed pull request #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary` URL: https://github.com/apache/spark/pull/38727 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] ulysses-you commented on pull request #38760: [SPARK-41219][SQL] Decimal changePrecision should work with decimal(0, 0)

2022-11-22 Thread GitBox
ulysses-you commented on PR #38760: URL: https://github.com/apache/spark/pull/38760#issuecomment-1324450892 cc @cloud-fan @revans2 @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] ulysses-you commented on a diff in pull request #38760: [SPARK-41219][SQL] Decimal changePrecision should work with decimal(0, 0)

2022-11-22 Thread GitBox
ulysses-you commented on code in PR #38760: URL: https://github.com/apache/spark/pull/38760#discussion_r1029962313 ## sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala: ## @@ -384,4 +384,11 @@ class DecimalSuite extends SparkFunSuite with

[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-22 Thread GitBox
ulysses-you commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1029960652 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala: ## @@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends

[GitHub] [spark] amaliujia commented on a diff in pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-22 Thread GitBox
amaliujia commented on code in PR #38686: URL: https://github.com/apache/spark/pull/38686#discussion_r1029946709 ## python/pyspark/sql/connect/dataframe.py: ## @@ -255,10 +255,21 @@ def distinct(self) -> "DataFrame": ) def drop(self, *cols: "ColumnOrString") ->

[GitHub] [spark] amaliujia commented on pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-22 Thread GitBox
amaliujia commented on PR #38686: URL: https://github.com/apache/spark/pull/38686#issuecomment-1324436991 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1029953393 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,138 @@ message AnalyzePlanRequest { // logging purposes and will not be

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1029953075 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,138 @@ message AnalyzePlanRequest { // logging purposes and will not be

[GitHub] [spark] amaliujia commented on pull request #38763: [SPARK-41201][CONNECT][PYTHON][TEST][FOLLOWUP] Reenable test_fill_na

2022-11-22 Thread GitBox
amaliujia commented on PR #38763: URL: https://github.com/apache/spark/pull/38763#issuecomment-1324432409 LGTM If you are interested in, can you BTW follow up in this PR on

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1029952200 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,138 @@ message AnalyzePlanRequest { // logging purposes and will not be

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1029951842 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -302,6 +301,31 @@ def test_to_pandas(self): self.spark.sql(query).toPandas(),

[GitHub] [spark] zhengruifeng opened a new pull request, #38763: [SPARK-41201][CONNECT][PYTHON][TEST][FOLLOWUP] Reenable test_fill_na

2022-11-22 Thread GitBox
zhengruifeng opened a new pull request, #38763: URL: https://github.com/apache/spark/pull/38763 ### What changes were proposed in this pull request? Reenable test_fill_na ### Why are the changes needed? `test_fill_na` was disabled by mistake in

[GitHub] [spark] amaliujia commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-22 Thread GitBox
amaliujia commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1029951479 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -302,6 +301,31 @@ def test_to_pandas(self): self.spark.sql(query).toPandas(),

[GitHub] [spark] amaliujia commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-22 Thread GitBox
amaliujia commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1029951172 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -302,6 +301,31 @@ def test_to_pandas(self): self.spark.sql(query).toPandas(),

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1029950542 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -302,6 +301,31 @@ def test_to_pandas(self): self.spark.sql(query).toPandas(),

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-22 Thread GitBox
HyukjinKwon commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1029950210 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -302,6 +301,31 @@ def test_to_pandas(self): self.spark.sql(query).toPandas(),

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1029949868 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,138 @@ message AnalyzePlanRequest { // logging purposes and will not be

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-22 Thread GitBox
zhengruifeng commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1029949413 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -302,6 +301,31 @@ def test_to_pandas(self): self.spark.sql(query).toPandas(),

[GitHub] [spark] HyukjinKwon commented on pull request #38751: [SPARK-40872][3.3] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-22 Thread GitBox
HyukjinKwon commented on PR #38751: URL: https://github.com/apache/spark/pull/38751#issuecomment-1324425695 Seems like the test failure looks unrelated. I don't mind merging it as is. Feel free to retrigger https://github.com/gaoyajun02/spark/runs/9633118279 @gaoyajun02 -- This

[GitHub] [spark] HyukjinKwon closed pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-22 Thread GitBox
HyukjinKwon closed pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client URL: https://github.com/apache/spark/pull/38723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-22 Thread GitBox
HyukjinKwon commented on PR #38723: URL: https://github.com/apache/spark/pull/38723#issuecomment-1324424425 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on a diff in pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-22 Thread GitBox
amaliujia commented on code in PR #38686: URL: https://github.com/apache/spark/pull/38686#discussion_r1029946709 ## python/pyspark/sql/connect/dataframe.py: ## @@ -255,10 +255,21 @@ def distinct(self) -> "DataFrame": ) def drop(self, *cols: "ColumnOrString") ->

[GitHub] [spark] amaliujia commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-22 Thread GitBox
amaliujia commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1029932646 ## python/pyspark/sql/connect/column.py: ## @@ -263,6 +263,22 @@ def __str__(self) -> str: return f"Column({self._unparsed_identifier})" +class

[GitHub] [spark] amaliujia commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-22 Thread GitBox
amaliujia commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1029932471 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -220,6 +220,29 @@ def test_create_global_temp_view(self): with

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38013: [SPARK-40509][SS][PYTHON] Add example for applyInPandasWithState

2022-11-22 Thread GitBox
HyukjinKwon commented on code in PR #38013: URL: https://github.com/apache/spark/pull/38013#discussion_r1029932042 ## examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py: ## @@ -0,0 +1,139 @@ +# +# Licensed to the Apache Software Foundation

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-22 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1029927503 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -282,6 +280,17 @@ final class ShuffleBlockFetcherIterator( } } +

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-22 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1029927503 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -282,6 +280,17 @@ final class ShuffleBlockFetcherIterator( } } +

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-22 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1029927503 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -282,6 +280,17 @@ final class ShuffleBlockFetcherIterator( } } +

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-22 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1029927503 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -282,6 +280,17 @@ final class ShuffleBlockFetcherIterator( } } +

[GitHub] [spark] grundprinzip commented on pull request #38762: [SPARK-41225] [CONNECT] [PYTHON] Disable unsupported functions.

2022-11-22 Thread GitBox
grundprinzip commented on PR #38762: URL: https://github.com/apache/spark/pull/38762#issuecomment-1324244517 @HyukjinKwon @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] grundprinzip opened a new pull request, #38762: [SPARK-41225] [CONNECT] [PYTHON] Disable unsupported functions.

2022-11-22 Thread GitBox
grundprinzip opened a new pull request, #38762: URL: https://github.com/apache/spark/pull/38762 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] ahshahid commented on a diff in pull request #38714: [WIP][SPARK-41141]. avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-22 Thread GitBox
ahshahid commented on code in PR #38714: URL: https://github.com/apache/spark/pull/38714#discussion_r1029822940 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -208,20 +208,33 @@ object SubExprUtils extends PredicateHelper { */

[GitHub] [spark] grundprinzip commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-22 Thread GitBox
grundprinzip commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1029721265 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -213,58 +214,115 @@ private[sql] object ArrowConverters extends

[GitHub] [spark] ahshahid commented on a diff in pull request #38714: [WIP][SPARK-41141]. avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-22 Thread GitBox
ahshahid commented on code in PR #38714: URL: https://github.com/apache/spark/pull/38714#discussion_r1029763899 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -208,20 +208,33 @@ object SubExprUtils extends PredicateHelper { */

[GitHub] [spark] MaxGekk commented on a diff in pull request #38575: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-22 Thread GitBox
MaxGekk commented on code in PR #38575: URL: https://github.com/apache/spark/pull/38575#discussion_r1029760391 ## R/pkg/tests/fulltests/test_sparkSQL.R: ## @@ -3990,12 +3990,16 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume

  1   2   >