[GitHub] [spark] gengliangwang commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-21 Thread GitBox
gengliangwang commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1028977314 ## core/src/main/scala/org/apache/spark/status/KVUtils.scala: ## @@ -80,6 +89,44 @@ private[spark] object KVUtils extends Logging { db } + def

[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1028966247 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -105,6 +106,15 @@ private[hive] class HiveClientImpl( private class

[GitHub] [spark] EnricoMi commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-11-21 Thread GitBox
EnricoMi commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1323236232 @wangyum @cloud-fan I am not sure if this is the right approach to fix `DeduplicateRelations`. Please advise. Problem is that `DeduplicateRelations` is only considering duplicates

[GitHub] [spark] LuciferYang commented on pull request #38737: [SPARK-41174][CORE][SQL] Propagate an error class to users for invalid `format` of `to_binary()`

2022-11-21 Thread GitBox
LuciferYang commented on PR #38737: URL: https://github.com/apache/spark/pull/38737#issuecomment-1323234015 @MaxGekk rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan commented on a diff in pull request #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38734: URL: https://github.com/apache/spark/pull/38734#discussion_r1028961395 ## python/pyspark/sql/connect/dataframe.py: ## @@ -122,6 +122,18 @@ def withPlan(cls, plan: plan.LogicalPlan, session: "RemoteSparkSession") -> "Dat

[GitHub] [spark] MaxGekk closed pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-21 Thread GitBox
MaxGekk closed pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS` URL: https://github.com/apache/spark/pull/38685 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk commented on pull request #38685: [SPARK-41206][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1233` to `COLUMN_ALREADY_EXISTS`

2022-11-21 Thread GitBox
MaxGekk commented on PR #38685: URL: https://github.com/apache/spark/pull/38685#issuecomment-1323221728 Merging to master. Thank you, @srielau and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1028944132 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -271,8 +273,12 @@ class SparkConnectPlanner(session:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1028943446 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -253,16 +253,94 @@ private[sql] object ArrowConverters extends Logging

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1028942760 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -3257,6 +3257,14 @@ private[spark] object Utils extends Logging { case _ =>

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1028942267 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -44,14 +47,18 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] grundprinzip commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-21 Thread GitBox
grundprinzip commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1028942189 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -220,6 +220,29 @@ def test_create_global_temp_view(self): with

[GitHub] [spark] MaxGekk closed pull request #38744: [SPARK-41217][SQL] Add the error class `FAILED_FUNCTION_CALL`

2022-11-21 Thread GitBox
MaxGekk closed pull request #38744: [SPARK-41217][SQL] Add the error class `FAILED_FUNCTION_CALL` URL: https://github.com/apache/spark/pull/38744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #38744: [SPARK-41217][SQL] Add the error class `FAILED_FUNCTION_CALL`

2022-11-21 Thread GitBox
LuciferYang commented on PR #38744: URL: https://github.com/apache/spark/pull/38744#issuecomment-1323206350 OK, I will rebase my pr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on pull request #38744: [SPARK-41217][SQL] Add the error class `FAILED_FUNCTION_CALL`

2022-11-21 Thread GitBox
MaxGekk commented on PR #38744: URL: https://github.com/apache/spark/pull/38744#issuecomment-1323205514 Merging to master. Thank you, @panbingkun @LuciferYang @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #38715: [SPARK-41197] Upgrade Kafka version to 3.3 release

2022-11-21 Thread GitBox
HyukjinKwon commented on PR #38715: URL: https://github.com/apache/spark/pull/38715#issuecomment-1323204958 cc @dongjoon-hyun and @HeartSaVioR FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-21 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1028935043 ## python/pyspark/sql/connect/dataframe.py: ## @@ -736,6 +736,19 @@ def toPandas(self) -> Optional["pandas.DataFrame"]: query =

[GitHub] [spark] LuciferYang commented on pull request #38752: [SPARK-40809][CONNECT][FOLLOW-UP] Do not use Buffer to make Scala 2.13 test pass

2022-11-21 Thread GitBox
LuciferYang commented on PR #38752: URL: https://github.com/apache/spark/pull/38752#issuecomment-1323197061 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon closed pull request #38697: [SPARK-41118][SQL][3.3] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-21 Thread GitBox
HyukjinKwon closed pull request #38697: [SPARK-41118][SQL][3.3] `to_number`/`try_to_number` should return `null` when format is `null` URL: https://github.com/apache/spark/pull/38697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang closed pull request #38753: [SPARK-40809][CONNECT][PYTHON][TESTS] Fix pyspark-connect test failed with Scala 2.13

2022-11-21 Thread GitBox
LuciferYang closed pull request #38753: [SPARK-40809][CONNECT][PYTHON][TESTS] Fix pyspark-connect test failed with Scala 2.13 URL: https://github.com/apache/spark/pull/38753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] LuciferYang commented on pull request #38753: [SPARK-40809][CONNECT][PYTHON][TESTS] Fix pyspark-connect test failed with Scala 2.13

2022-11-21 Thread GitBox
LuciferYang commented on PR #38753: URL: https://github.com/apache/spark/pull/38753#issuecomment-1323196610 dup with https://github.com/apache/spark/pull/38752, close this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #38697: [SPARK-41118][SQL][3.3] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-21 Thread GitBox
HyukjinKwon commented on PR #38697: URL: https://github.com/apache/spark/pull/38697#issuecomment-1323196618 Merged to branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38752: [SPARK-40809][CONNECT][FOLLOW-UP] Do not use Buffer to make Scala 2.13 test pass

2022-11-21 Thread GitBox
HyukjinKwon closed pull request #38752: [SPARK-40809][CONNECT][FOLLOW-UP] Do not use Buffer to make Scala 2.13 test pass URL: https://github.com/apache/spark/pull/38752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #38752: [SPARK-40809][CONNECT][FOLLOW-UP] Do not use Buffer to make Scala 2.13 test pass

2022-11-21 Thread GitBox
HyukjinKwon commented on PR #38752: URL: https://github.com/apache/spark/pull/38752#issuecomment-1323196020 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38631: URL: https://github.com/apache/spark/pull/38631#discussion_r1028933832 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -301,6 +301,20 @@ def test_simple_datasource_read(self) -> None: actualResult =

[GitHub] [spark] LuciferYang commented on pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingS

2022-11-21 Thread GitBox
LuciferYang commented on PR #38704: URL: https://github.com/apache/spark/pull/38704#issuecomment-1323195610 Thanks @HyukjinKwon @mridulm @liuzqt -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon closed pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-21 Thread GitBox
HyukjinKwon closed pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop` URL: https://github.com/apache/spark/pull/38686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #38735: [SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.__repr__` and `DataFrame.dtypes`

2022-11-21 Thread GitBox
zhengruifeng commented on PR #38735: URL: https://github.com/apache/spark/pull/38735#issuecomment-1323195553 @HyukjinKwon thanks for the reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-21 Thread GitBox
HyukjinKwon commented on PR #38686: URL: https://github.com/apache/spark/pull/38686#issuecomment-1323195361 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite`

2022-11-21 Thread GitBox
HyukjinKwon closed pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite` URL: https://github.com/apache/spark/pull/38704 -- This is an automated message from the Apache Git Service.

[GitHub] [spark] HyukjinKwon commented on pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingS

2022-11-21 Thread GitBox
HyukjinKwon commented on PR #38704: URL: https://github.com/apache/spark/pull/38704#issuecomment-1323194208 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1028931691 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -220,6 +220,29 @@ def test_create_global_temp_view(self): with

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1028931101 ## python/pyspark/sql/connect/column.py: ## @@ -263,6 +263,22 @@ def __str__(self) -> str: return f"Column({self._unparsed_identifier})" +class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38723: URL: https://github.com/apache/spark/pull/38723#discussion_r1028930295 ## python/pyspark/sql/connect/column.py: ## @@ -263,6 +263,22 @@ def __str__(self) -> str: return f"Column({self._unparsed_identifier})" +class

[GitHub] [spark] HyukjinKwon closed pull request #38731: [SPARK-41209][PYTHON] Improve PySpark type inference in _merge_type method

2022-11-21 Thread GitBox
HyukjinKwon closed pull request #38731: [SPARK-41209][PYTHON] Improve PySpark type inference in _merge_type method URL: https://github.com/apache/spark/pull/38731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #38731: [SPARK-41209][PYTHON] Improve PySpark type inference in _merge_type method

2022-11-21 Thread GitBox
HyukjinKwon commented on PR #38731: URL: https://github.com/apache/spark/pull/38731#issuecomment-1323189415 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38735: [SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.__repr__` and `DataFrame.dtypes`

2022-11-21 Thread GitBox
HyukjinKwon closed pull request #38735: [SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.__repr__` and `DataFrame.dtypes` URL: https://github.com/apache/spark/pull/38735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #38735: [SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.__repr__` and `DataFrame.dtypes`

2022-11-21 Thread GitBox
HyukjinKwon commented on PR #38735: URL: https://github.com/apache/spark/pull/38735#issuecomment-1323188640 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #38302: [SPARK-40834][SQL] Use SparkListenerSQLExecutionEnd to track final SQL status in UI

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38302: URL: https://github.com/apache/spark/pull/38302#discussion_r1028926326 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala: ## @@ -56,7 +56,10 @@ case class SparkListenerSQLExecutionStart( } @DeveloperApi

[GitHub] [spark] LuciferYang opened a new pull request, #38753: [SPARK-40809][CONNECT][TESTS] Fix pyspark-connect test failed with Scala 2.13

2022-11-21 Thread GitBox
LuciferYang opened a new pull request, #38753: URL: https://github.com/apache/spark/pull/38753 ### What changes were proposed in this pull request? This pr simplify assertions to fix `pyspark-connect` test failed with Scala 2.13. ### Why are the changes needed? Fix

[GitHub] [spark] HeartSaVioR closed pull request #38748: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
HeartSaVioR closed pull request #38748: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent URL: https://github.com/apache/spark/pull/38748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/s

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1028915567 ## python/pyspark/sql/connect/dataframe.py: ## @@ -736,6 +736,19 @@ def toPandas(self) -> Optional["pandas.DataFrame"]: query =

[GitHub] [spark] HeartSaVioR commented on pull request #38748: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
HeartSaVioR commented on PR #38748: URL: https://github.com/apache/spark/pull/38748#issuecomment-1323163883 Thanks! Merging to 3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #38748: [SPARK-41151][SQL][3.3] Keep built-in file `_metadata` column nullable value consistent

2022-11-21 Thread GitBox
HeartSaVioR commented on PR #38748: URL: https://github.com/apache/spark/pull/38748#issuecomment-1323163589 Looks like GA build failed to take the result of forked GA build. Here is a success run from the forked repository. https://github.com/Yaohua628/spark/runs/9630498867 -- This is

[GitHub] [spark] LuciferYang commented on a diff in pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38631: URL: https://github.com/apache/spark/pull/38631#discussion_r1028908773 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -301,6 +301,20 @@ def test_simple_datasource_read(self) -> None: actualResult =

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38302: [SPARK-40834][SQL] Use SparkListenerSQLExecutionEnd to track final SQL status in UI

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38302: URL: https://github.com/apache/spark/pull/38302#discussion_r1028906041 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala: ## @@ -56,7 +56,10 @@ case class SparkListenerSQLExecutionStart( }

[GitHub] [spark] HyukjinKwon opened a new pull request, #38752: [SPARK-40809][CONNECT][FOLLOW-UP] Do not use Buffer to make Scala 2.13 test pass

2022-11-21 Thread GitBox
HyukjinKwon opened a new pull request, #38752: URL: https://github.com/apache/spark/pull/38752 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/38631 that fixes the test to pass in Scala 2.13 by avoiding using `Buffer`

[GitHub] [spark] panbingkun commented on pull request #38744: [SPARK-41217][SQL] Add the error class `FAILED_FUNCTION_CALL`

2022-11-21 Thread GitBox
panbingkun commented on PR #38744: URL: https://github.com/apache/spark/pull/38744#issuecomment-1323142938 +1, LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] gaoyajun02 commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-21 Thread GitBox
gaoyajun02 commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1323136879 > Thank you, @gaoyajun02 , @mridulm , @otterc . > > * Do we need to backport this to branch-3.3? > * According to the previous failure description, what happens in branch-3.3

[GitHub] [spark] gaoyajun02 commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-21 Thread GitBox
gaoyajun02 commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1323119482 > I was on two minds whether to fix this in 3.3 as well ... Yes, 3.3 is affected by it. > > But agree, a backport to branch-3.3 would be helpful. Can you get it a shot

[GitHub] [spark] gaoyajun02 opened a new pull request, #38751: [SPARK-40872][3.3] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-21 Thread GitBox
gaoyajun02 opened a new pull request, #38751: URL: https://github.com/apache/spark/pull/38751 ### What changes were proposed in this pull request? This is a backport PR of #38333 When push-based shuffle is enabled, a zero-size buf error may occur when fetching shuffle chunks from bad

[GitHub] [spark] toujours33 commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
toujours33 commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1323074280 > How far back should this backport? I hope it can be back to 3.3+(3.3 included). For version 3.3 is mainly used in our production environment~ -- This is an automated

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1028830007 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -749,8 +749,10 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] LuciferYang commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
LuciferYang commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1323065288 How far back should this backport? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1028823929 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -749,8 +749,10 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1028821337 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -749,8 +749,10 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] itholic commented on a diff in pull request #38650: [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION`

2022-11-21 Thread GitBox
itholic commented on code in PR #38650: URL: https://github.com/apache/spark/pull/38650#discussion_r1028818844 ## core/src/main/resources/error/error-classes.json: ## @@ -656,6 +656,11 @@ ], "sqlState" : "42000" }, + "INVALID_EMPTY_LOCATION" : { +"message" : [

[GitHub] [spark] itholic commented on a diff in pull request #38650: [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION`

2022-11-21 Thread GitBox
itholic commented on code in PR #38650: URL: https://github.com/apache/spark/pull/38650#discussion_r1028818844 ## core/src/main/resources/error/error-classes.json: ## @@ -656,6 +656,11 @@ ], "sqlState" : "42000" }, + "INVALID_EMPTY_LOCATION" : { +"message" : [

[GitHub] [spark] dengziming commented on pull request #38715: [SPARK-41197] Upgrade Kafka version to 3.3 release

2022-11-21 Thread GitBox
dengziming commented on PR #38715: URL: https://github.com/apache/spark/pull/38715#issuecomment-1323037097 These failures comes from [apache/kafka#12049](https://github.com/apache/kafka/pull/12049) and is described here: https://kafka.apache.org/documentation/#upgrade_33_notable The

[GitHub] [spark] itholic commented on a diff in pull request #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042`

2022-11-21 Thread GitBox
itholic commented on code in PR #38664: URL: https://github.com/apache/spark/pull/38664#discussion_r1028813303 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -146,7 +147,10 @@ object FunctionRegistryBase {

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-21 Thread GitBox
zhengruifeng commented on code in PR #38734: URL: https://github.com/apache/spark/pull/38734#discussion_r1028812420 ## python/pyspark/sql/connect/dataframe.py: ## @@ -122,6 +122,20 @@ def withPlan(cls, plan: plan.LogicalPlan, session: "RemoteSparkSession") -> "Dat

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38686: [SPARK-41169][CONNECT][PYTHON] Implement `DataFrame.drop`

2022-11-21 Thread GitBox
zhengruifeng commented on code in PR #38686: URL: https://github.com/apache/spark/pull/38686#discussion_r1028811655 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -148,6 +148,23 @@ class SparkConnectProtoSuite

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1028811606 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -749,8 +749,10 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] itholic commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-21 Thread GitBox
itholic commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1028810170 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1059,10 +1060,16 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1028807362 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -749,8 +749,10 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1028804426 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -749,8 +749,10 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1028803675 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -749,8 +749,10 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1028802840 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -749,8 +749,10 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] cloud-fan commented on a diff in pull request #38703: [SPARK-41191] [SQL] Cache Table is not working while nested caches exist

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38703: URL: https://github.com/apache/spark/pull/38703#discussion_r1028795081 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -355,7 +355,7 @@ case class ListQuery( plan.canonicalized,

[GitHub] [spark] cloud-fan commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1028793298 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest }.isEmpty) } }

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-21 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1028639472 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,135 @@ message AnalyzePlanRequest { // logging purposes and will not be

[GitHub] [spark] amaliujia commented on a diff in pull request #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-21 Thread GitBox
amaliujia commented on code in PR #38734: URL: https://github.com/apache/spark/pull/38734#discussion_r1028724092 ## python/pyspark/sql/connect/dataframe.py: ## @@ -122,6 +122,20 @@ def withPlan(cls, plan: plan.LogicalPlan, session: "RemoteSparkSession") -> "Dat

[GitHub] [spark] 19855134604 commented on a diff in pull request #38743: [SPARK-41215][BUILD][PROTOBUF] Support user configurable protoc executables when building Spark Protobuf.

2022-11-21 Thread GitBox
19855134604 commented on code in PR #38743: URL: https://github.com/apache/spark/pull/38743#discussion_r1028790815 ## connector/protobuf/README.md: ## @@ -0,0 +1,37 @@ +# Spark Protobuf - Developer Documentation + +## Getting Started + +### Build + +```bash +./build/mvn -Phive

[GitHub] [spark] cloud-fan closed pull request #38738: WIP

2022-11-21 Thread GitBox
cloud-fan closed pull request #38738: WIP URL: https://github.com/apache/spark/pull/38738 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-21 Thread GitBox
zhengruifeng commented on code in PR #38734: URL: https://github.com/apache/spark/pull/38734#discussion_r1028789393 ## python/pyspark/sql/connect/dataframe.py: ## @@ -122,6 +122,20 @@ def withPlan(cls, plan: plan.LogicalPlan, session: "RemoteSparkSession") -> "Dat

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1028788976 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -774,17 +776,16 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1028783234 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -774,17 +776,16 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] sadikovi commented on pull request #38731: [SPARK-41209][PYTHON] Improve PySpark type inference in _merge_type method

2022-11-21 Thread GitBox
sadikovi commented on PR #38731: URL: https://github.com/apache/spark/pull/38731#issuecomment-1322965568 @xinrong-meng I have updated the PR description to clarify the user-facing change. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1028755248 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClient.scala: ## @@ -113,6 +113,9 @@ private[hive] trait HiveClient { /** Creates a table with the

[GitHub] [spark] cloud-fan closed pull request #38746: [SPARK-41017][SQL][FOLLOWUP] Push Filter with both deterministic and nondeterministic predicates

2022-11-21 Thread GitBox
cloud-fan closed pull request #38746: [SPARK-41017][SQL][FOLLOWUP] Push Filter with both deterministic and nondeterministic predicates URL: https://github.com/apache/spark/pull/38746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on pull request #38746: [SPARK-41017][SQL][FOLLOWUP] Push Filter with both deterministic and nondeterministic predicates

2022-11-21 Thread GitBox
cloud-fan commented on PR #38746: URL: https://github.com/apache/spark/pull/38746#issuecomment-1322944376 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] wankunde commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-21 Thread GitBox
wankunde commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1028741655 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala: ## @@ -894,12 +895,14 @@ class InsertSuite extends QueryTest with TestHiveSingleton with

[GitHub] [spark] wankunde commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-21 Thread GitBox
wankunde commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1028741472 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -721,19 +721,18 @@ private[spark] class HiveExternalCatalog(conf: SparkConf,

[GitHub] [spark] cloud-fan commented on a diff in pull request #38747: [SPARK-40834][SQL][FOLLOWUP] Take care of legacy query end events

2022-11-21 Thread GitBox
cloud-fan commented on code in PR #38747: URL: https://github.com/apache/spark/pull/38747#discussion_r1028740897 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -121,7 +121,11 @@ object SQLExecution {

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #38622: [SPARK-39601][YARN] AllocationFailure should not be treated as exitCausedByApp when driver is shutting down

2022-11-21 Thread GitBox
AngersZh commented on code in PR #38622: URL: https://github.com/apache/spark/pull/38622#discussion_r1028739055 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala: ## @@ -815,6 +815,7 @@ private[spark] class ApplicationMaster(

[GitHub] [spark] LuciferYang commented on pull request #38743: [SPARK-41215][BUILD][PROTOBUF] Support user configurable protoc executables when building Spark Protobuf.

2022-11-21 Thread GitBox
LuciferYang commented on PR #38743: URL: https://github.com/apache/spark/pull/38743#issuecomment-1322936602 cc @HyukjinKwon FYI, a similar fix as SPARK-40593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] ulysses-you commented on a diff in pull request #38747: [SPARK-40834][SQL][FOLLOWUP] Take care of legacy query end events

2022-11-21 Thread GitBox
ulysses-you commented on code in PR #38747: URL: https://github.com/apache/spark/pull/38747#discussion_r1028738643 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -121,7 +121,11 @@ object SQLExecution {

[GitHub] [spark] LuciferYang commented on a diff in pull request #38743: [SPARK-41215][BUILD][PROTOBUF] Support user configurable protoc executables when building Spark Protobuf.

2022-11-21 Thread GitBox
LuciferYang commented on code in PR #38743: URL: https://github.com/apache/spark/pull/38743#discussion_r1028737655 ## connector/protobuf/README.md: ## @@ -0,0 +1,37 @@ +# Spark Protobuf - Developer Documentation + +## Getting Started + +### Build + +```bash +./build/mvn -Phive

[GitHub] [spark] amaliujia commented on pull request #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-21 Thread GitBox
amaliujia commented on PR #38734: URL: https://github.com/apache/spark/pull/38734#issuecomment-1322927685 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on a diff in pull request #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-21 Thread GitBox
amaliujia commented on code in PR #38734: URL: https://github.com/apache/spark/pull/38734#discussion_r1028724092 ## python/pyspark/sql/connect/dataframe.py: ## @@ -122,6 +122,20 @@ def withPlan(cls, plan: plan.LogicalPlan, session: "RemoteSparkSession") -> "Dat

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38734: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-21 Thread GitBox
HyukjinKwon commented on code in PR #38734: URL: https://github.com/apache/spark/pull/38734#discussion_r1028721327 ## python/pyspark/sql/connect/dataframe.py: ## @@ -122,6 +122,20 @@ def withPlan(cls, plan: plan.LogicalPlan, session: "RemoteSparkSession") -> "Dat

[GitHub] [spark] amaliujia commented on a diff in pull request #38735: [SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.__repr__` and `DataFrame.dtypes`

2022-11-21 Thread GitBox
amaliujia commented on code in PR #38735: URL: https://github.com/apache/spark/pull/38735#discussion_r1028720171 ## python/pyspark/sql/connect/dataframe.py: ## @@ -115,6 +115,9 @@ def __init__( self._cache: Dict[str, Any] = {} self._session:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38735: [SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.__repr__` and `DataFrame.dtypes`

2022-11-21 Thread GitBox
zhengruifeng commented on code in PR #38735: URL: https://github.com/apache/spark/pull/38735#discussion_r1028719570 ## python/pyspark/sql/connect/dataframe.py: ## @@ -115,6 +115,9 @@ def __init__( self._cache: Dict[str, Any] = {} self._session:

[GitHub] [spark] cloud-fan closed pull request #38741: [SPARK-41154][SQL][3.3] Incorrect relation caching for queries with time travel spec

2022-11-21 Thread GitBox
cloud-fan closed pull request #38741: [SPARK-41154][SQL][3.3] Incorrect relation caching for queries with time travel spec URL: https://github.com/apache/spark/pull/38741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-21 Thread GitBox
ulysses-you commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1028718134 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest }.isEmpty) }

[GitHub] [spark] cloud-fan commented on pull request #38741: [SPARK-41154][SQL][3.3] Incorrect relation caching for queries with time travel spec

2022-11-21 Thread GitBox
cloud-fan commented on PR #38741: URL: https://github.com/apache/spark/pull/38741#issuecomment-1322916287 tests all passed: https://github.com/ulysses-you/spark/runs/9613393804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan commented on pull request #38741: [SPARK-41154][SQL][3.3] Incorrect relation caching for queries with time travel spec

2022-11-21 Thread GitBox
cloud-fan commented on PR #38741: URL: https://github.com/apache/spark/pull/38741#issuecomment-1322916371 thanks, merging to 3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #38706: [SPARK-41005][COLLECT][FOLLOWUP] Remove JSON code path and use `RDD.collect` in Arrow code path

2022-11-21 Thread GitBox
cloud-fan commented on PR #38706: URL: https://github.com/apache/spark/pull/38706#issuecomment-1322913672 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] xinrong-meng commented on pull request #38731: [SPARK-41209][PYSPARK] Improve PySpark type inference in _merge_type method

2022-11-21 Thread GitBox
xinrong-meng commented on PR #38731: URL: https://github.com/apache/spark/pull/38731#issuecomment-1322912963 Shall we add an example to elaborate `Does this PR introduce any user-facing change?`? The change might be in the 3.4 release note. -- This is an automated message from the Apache

[GitHub] [spark] jerrypeng commented on pull request #38517: [SPARK-39591][SS] Async Progress Tracking

2022-11-21 Thread GitBox
jerrypeng commented on PR #38517: URL: https://github.com/apache/spark/pull/38517#issuecomment-1322908492 @HeartSaVioR Please review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

  1   2   3   >