[GitHub] [spark] HyukjinKwon commented on pull request #39400: [SPARK-41891][CONNECT][TESTS] Enable test_add_months_function, test_array_repeat, test_dayofweek, test_first_last_ignorenulls, test_inlin

2023-01-04 Thread GitBox
HyukjinKwon commented on PR #39400: URL: https://github.com/apache/spark/pull/39400#issuecomment-1371890753 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on a diff in pull request #39394: [SPARK-41575][SQL] Assign name to _LEGACY_ERROR_TEMP_2054

2023-01-04 Thread GitBox
MaxGekk commented on code in PR #39394: URL: https://github.com/apache/spark/pull/39394#discussion_r1062189805 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -784,7 +784,7 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] dongjoon-hyun commented on pull request #39401: [SPARK-41893][BUILD] Publish SBOM artifacts

2023-01-04 Thread GitBox
dongjoon-hyun commented on PR #39401: URL: https://github.com/apache/spark/pull/39401#issuecomment-1371884646 Ah, it seems that I missed some failures. I convert this as `Draft`. Let me dig this. ``` [WARNING] An unexpected issue occurred attempting to resolve the effective pom for

[GitHub] [spark] dongjoon-hyun commented on pull request #39401: [SPARK-41893][BUILD] Publish SBOM artifacts

2023-01-04 Thread GitBox
dongjoon-hyun commented on PR #39401: URL: https://github.com/apache/spark/pull/39401#issuecomment-1371876411 cc @srowen and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] panbingkun opened a new pull request, #39402: [SPARK-41889][SQL] Attach root cause to invalidPatternError

2023-01-04 Thread GitBox
panbingkun opened a new pull request, #39402: URL: https://github.com/apache/spark/pull/39402 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39401: [SPARK-41893][BUILD] Publish SBOM artifacts

2023-01-04 Thread GitBox
dongjoon-hyun opened a new pull request, #39401: URL: https://github.com/apache/spark/pull/39401 ### What changes were proposed in this pull request? This PR aims to publish `SBOM` artifacts. ### Why are the changes needed? Here is an article to give some context. -

[GitHub] [spark] zhengruifeng commented on pull request #39398: [SPARK-41829][CONNECT][PYTHON] Add the missing ordering parameter in `Sort` and `sortWithinPartitions`

2023-01-04 Thread GitBox
zhengruifeng commented on PR #39398: URL: https://github.com/apache/spark/pull/39398#issuecomment-1371869339 merged into master, thank you @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng closed pull request #39398: [SPARK-41829][CONNECT][PYTHON] Add the missing ordering parameter in `Sort` and `sortWithinPartitions`

2023-01-04 Thread GitBox
zhengruifeng closed pull request #39398: [SPARK-41829][CONNECT][PYTHON] Add the missing ordering parameter in `Sort` and `sortWithinPartitions` URL: https://github.com/apache/spark/pull/39398 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] EnricoMi commented on a diff in pull request #38356: [SPARK-40885] `Sort` may not take effect when it is the last 'Transform' operator

2023-01-04 Thread GitBox
EnricoMi commented on code in PR #38356: URL: https://github.com/apache/spark/pull/38356#discussion_r1021126305 ## sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala: ## @@ -220,6 +220,23 @@ class PartitionedWriteSuite extends QueryTest with

[GitHub] [spark] EnricoMi commented on pull request #39131: [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations

2023-01-04 Thread GitBox
EnricoMi commented on PR #39131: URL: https://github.com/apache/spark/pull/39131#issuecomment-1371848352 > @EnricoMi thanks for the fix! which spark version starts to have this bug? This was introduced in Spark 3.0.0. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] MaxGekk commented on a diff in pull request #39258: [SPARK-41572][SQL] Assign name to _LEGACY_ERROR_TEMP_2149

2023-01-04 Thread GitBox
MaxGekk commented on code in PR #39258: URL: https://github.com/apache/spark/pull/39258#discussion_r1062158983 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala: ## @@ -3138,13 +3141,54 @@ class CSVv1Suite extends CSVSuite { super

[GitHub] [spark] ulysses-you commented on pull request #38163: [SPARK-40711][SQL] Add spill size metrics for window

2023-01-04 Thread GitBox
ulysses-you commented on PR #38163: URL: https://github.com/apache/spark/pull/38163#issuecomment-1371842097 let me rebase this again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] EnricoMi commented on a diff in pull request #39131: [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations

2023-01-04 Thread GitBox
EnricoMi commented on code in PR #39131: URL: https://github.com/apache/spark/pull/39131#discussion_r1062152718 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LeftSemiAntiJoinPushDownSuite.scala: ## @@ -46,7 +46,7 @@ class LeftSemiPushdownSuite extends

[GitHub] [spark] ulysses-you commented on a diff in pull request #39377: [SPARK-41867][SQL] Selective predicate should respect InMemoryRelation

2023-01-04 Thread GitBox
ulysses-you commented on code in PR #39377: URL: https://github.com/apache/spark/pull/39377#discussion_r1062152134 ## sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala: ## @@ -167,44 +167,10 @@ object PartitionPruning extends

[GitHub] [spark] ulysses-you commented on a diff in pull request #39377: [SPARK-41867][SQL] Selective predicate should respect InMemoryRelation

2023-01-04 Thread GitBox
ulysses-you commented on code in PR #39377: URL: https://github.com/apache/spark/pull/39377#discussion_r1062151959 ## sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala: ## @@ -167,44 +167,10 @@ object PartitionPruning extends

[GitHub] [spark] ulysses-you commented on pull request #39377: [SPARK-41867][SQL] Selective predicate should respect InMemoryRelation

2023-01-04 Thread GitBox
ulysses-you commented on PR #39377: URL: https://github.com/apache/spark/pull/39377#issuecomment-1371839438 @cloud-fan good suggestion. Follow the current code path: 1. check if build side has selective predicate. Now it always returns ture if the cached relation is materialized 2.

[GitHub] [spark] LuciferYang commented on pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
LuciferYang commented on PR #39385: URL: https://github.com/apache/spark/pull/39385#issuecomment-1371825146 Rebased(merged https://github.com/apache/spark/pull/39226), please help to review if you have time, thanks @gengliangwang @dongjoon-hyun @techaddict -- This is an automated

[GitHub] [spark] LuciferYang commented on a diff in pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
LuciferYang commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1062138688 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala: ## @@ -1007,6 +1002,34 @@ class SQLAppStatusListenerSuite extends

[GitHub] [spark] MaxGekk commented on a diff in pull request #39389: [SPARK-41574][SQL] Assign name to _LEGACY_ERROR_TEMP_2009

2023-01-04 Thread GitBox
MaxGekk commented on code in PR #39389: URL: https://github.com/apache/spark/pull/39389#discussion_r1062135726 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -378,10 +378,9 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] MaxGekk closed pull request #39305: [SPARK-41580][SQL] Assign name to _LEGACY_ERROR_TEMP_2137

2023-01-04 Thread GitBox
MaxGekk closed pull request #39305: [SPARK-41580][SQL] Assign name to _LEGACY_ERROR_TEMP_2137 URL: https://github.com/apache/spark/pull/39305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on pull request #39305: [SPARK-41580][SQL] Assign name to _LEGACY_ERROR_TEMP_2137

2023-01-04 Thread GitBox
MaxGekk commented on PR #39305: URL: https://github.com/apache/spark/pull/39305#issuecomment-1371814622 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] beliefer commented on pull request #39378: [SPARK-41821][CONNECT][PYTHON] Fix doc test for DataFrame.describe

2023-01-04 Thread GitBox
beliefer commented on PR #39378: URL: https://github.com/apache/spark/pull/39378#issuecomment-1371811698 @HyukjinKwon @zhengruifeng Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk closed pull request #39281: [SPARK-41576][SQL] Assign name to _LEGACY_ERROR_TEMP_2051

2023-01-04 Thread GitBox
MaxGekk closed pull request #39281: [SPARK-41576][SQL] Assign name to _LEGACY_ERROR_TEMP_2051 URL: https://github.com/apache/spark/pull/39281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on pull request #39281: [SPARK-41576][SQL] Assign name to _LEGACY_ERROR_TEMP_2051

2023-01-04 Thread GitBox
MaxGekk commented on PR #39281: URL: https://github.com/apache/spark/pull/39281#issuecomment-1371809041 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon closed pull request #39378: [SPARK-41821][CONNECT][PYTHON] Fix doc test for DataFrame.describe

2023-01-04 Thread GitBox
HyukjinKwon closed pull request #39378: [SPARK-41821][CONNECT][PYTHON] Fix doc test for DataFrame.describe URL: https://github.com/apache/spark/pull/39378 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #39378: [SPARK-41821][CONNECT][PYTHON] Fix doc test for DataFrame.describe

2023-01-04 Thread GitBox
HyukjinKwon commented on PR #39378: URL: https://github.com/apache/spark/pull/39378#issuecomment-1371806726 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39400: [SPARK-41891][CONNECT][TESTS] Enable test_add_months_function, test_array_repeat, test_dayofweek, test_first_last_ignorenulls,

2023-01-04 Thread GitBox
HyukjinKwon commented on code in PR #39400: URL: https://github.com/apache/spark/pull/39400#discussion_r1062125012 ## python/pyspark/sql/tests/connect/test_parity_functions.py: ## @@ -68,30 +60,14 @@ def test_date_add_function(self): def test_date_sub_function(self):

[GitHub] [spark] HyukjinKwon closed pull request #39397: [MINOR][CONNECT] Fix typos in connect/plan.py

2023-01-04 Thread GitBox
HyukjinKwon closed pull request #39397: [MINOR][CONNECT] Fix typos in connect/plan.py URL: https://github.com/apache/spark/pull/39397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #39397: [MINOR][CONNECT] Fix typos in connect/plan.py

2023-01-04 Thread GitBox
HyukjinKwon commented on PR #39397: URL: https://github.com/apache/spark/pull/39397#issuecomment-1371800977 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #39393: [SPARK-41871][CONNECT] DataFrame hint parameter can be str, float or int

2023-01-04 Thread GitBox
HyukjinKwon closed pull request #39393: [SPARK-41871][CONNECT] DataFrame hint parameter can be str, float or int URL: https://github.com/apache/spark/pull/39393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #39393: [SPARK-41871][CONNECT] DataFrame hint parameter can be str, float or int

2023-01-04 Thread GitBox
HyukjinKwon commented on PR #39393: URL: https://github.com/apache/spark/pull/39393#issuecomment-1371800022 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #39398: [SPARK-41829][CONNECT][PYTHON] Add the missing ordering parameter in `Sort` and `sortWithinPartitions`

2023-01-04 Thread GitBox
zhengruifeng commented on PR #39398: URL: https://github.com/apache/spark/pull/39398#issuecomment-1371784933 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dengziming commented on pull request #39388: [SPARK-41354][CONNECT][PYTHON] implement RepartitionByExpression

2023-01-04 Thread GitBox
dengziming commented on PR #39388: URL: https://github.com/apache/spark/pull/39388#issuecomment-1371775675 > just to confirm, the proto `RepartitionByExpression repartition_by_expression = 27` can support both > > `def repartition(self, *cols: "ColumnOrName")` `def

[GitHub] [spark] cloud-fan commented on pull request #39395: [SQL] Use foldLeft for DeduplicateRelations

2023-01-04 Thread GitBox
cloud-fan commented on PR #39395: URL: https://github.com/apache/spark/pull/39395#issuecomment-1371775038 looks fine if tests pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] techaddict opened a new pull request, #39400: [SPARK-41891][CONNECT][TESTS] Enable test_add_months_function, test_array_repeat, test_dayofweek, test_first_last_ignorenulls, test_funct

2023-01-04 Thread GitBox
techaddict opened a new pull request, #39400: URL: https://github.com/apache/spark/pull/39400 ### What changes were proposed in this pull request? Enabling tests in connect/test_parity_functions.py ### Why are the changes needed? Improved coverage ### Does this PR

[GitHub] [spark] cloud-fan closed pull request #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files

2023-01-04 Thread GitBox
cloud-fan closed pull request #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files URL: https://github.com/apache/spark/pull/36700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on pull request #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files

2023-01-04 Thread GitBox
cloud-fan commented on PR #36700: URL: https://github.com/apache/spark/pull/36700#issuecomment-1371774108 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #38163: [SPARK-40711][SQL] Add spill size metrics for window

2023-01-04 Thread GitBox
cloud-fan commented on PR #38163: URL: https://github.com/apache/spark/pull/38163#issuecomment-1371773428 LGTM if all tests pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #39226: [SPARK-41694][CORE] Isolate RocksDB path for Live UI and automatically cleanup when `SparkContext.stop()`

2023-01-04 Thread GitBox
LuciferYang commented on PR #39226: URL: https://github.com/apache/spark/pull/39226#issuecomment-1371772823 Thanks @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang closed pull request #39226: [SPARK-41694][CORE] Isolate RocksDB path for Live UI and automatically cleanup when `SparkContext.stop()`

2023-01-04 Thread GitBox
gengliangwang closed pull request #39226: [SPARK-41694][CORE] Isolate RocksDB path for Live UI and automatically cleanup when `SparkContext.stop()` URL: https://github.com/apache/spark/pull/39226 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] gengliangwang commented on pull request #39226: [SPARK-41694][CORE] Isolate RocksDB path for Live UI and automatically cleanup when `SparkContext.stop()`

2023-01-04 Thread GitBox
gengliangwang commented on PR #39226: URL: https://github.com/apache/spark/pull/39226#issuecomment-1371770833 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang opened a new pull request, #39399: [SPARK-41890][CORE][SQL][UI] Reduce `toSeq` in `RDDOperationGraphWrapperSerializer`/`SparkPlanGraphWrapperSerializer` for Scala 2.13

2023-01-04 Thread GitBox
LuciferYang opened a new pull request, #39399: URL: https://github.com/apache/spark/pull/39399 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] shrprasa commented on pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2023-01-04 Thread GitBox
shrprasa commented on PR #37880: URL: https://github.com/apache/spark/pull/37880#issuecomment-1371749436 @dongjoon-hyun @holdenk Can you please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng opened a new pull request, #39398: [SPARK-41829][CONNECT][PYTHON] Add the missing ordering parameter in `Sort` and `sortWithinPartitions`

2023-01-04 Thread GitBox
zhengruifeng opened a new pull request, #39398: URL: https://github.com/apache/spark/pull/39398 ### What changes were proposed in this pull request? Add the missing ordering parameter in `Sort` and `sortWithinPartitions` ### Why are the changes needed? API coverage

[GitHub] [spark] zhengruifeng commented on pull request #39396: [SPARK-41825][CONNECT][PYTHON] Enable doctests related to `DataFrame.show`

2023-01-04 Thread GitBox
zhengruifeng commented on PR #39396: URL: https://github.com/apache/spark/pull/39396#issuecomment-1371745236 thank you @HyukjinKwon for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon closed pull request #39396: [SPARK-41825][CONNECT][PYTHON] Enable doctests related to `DataFrame.show`

2023-01-04 Thread GitBox
HyukjinKwon closed pull request #39396: [SPARK-41825][CONNECT][PYTHON] Enable doctests related to `DataFrame.show` URL: https://github.com/apache/spark/pull/39396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #39396: [SPARK-41825][CONNECT][PYTHON] Enable doctests related to `DataFrame.show`

2023-01-04 Thread GitBox
HyukjinKwon commented on PR #39396: URL: https://github.com/apache/spark/pull/39396#issuecomment-1371739702 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] techaddict commented on pull request #39397: [MINOR] fix typos

2023-01-04 Thread GitBox
techaddict commented on PR #39397: URL: https://github.com/apache/spark/pull/39397#issuecomment-1371734926 cc: @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] techaddict opened a new pull request, #39397: [MINOR] fix typos

2023-01-04 Thread GitBox
techaddict opened a new pull request, #39397: URL: https://github.com/apache/spark/pull/39397 ### What changes were proposed in this pull request? Fixing typos in connect/plan.py ### Why are the changes needed? Typos ### Does this PR introduce _any_ user-facing

[GitHub] [spark] itholic commented on a diff in pull request #39260: [SPARK-41579][SQL] Assign name to _LEGACY_ERROR_TEMP_1249

2023-01-04 Thread GitBox
itholic commented on code in PR #39260: URL: https://github.com/apache/spark/pull/39260#discussion_r1062073949 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2405,22 +2405,24 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] itholic commented on a diff in pull request #39282: [SPARK-41581][SQL] Assign name to _LEGACY_ERROR_TEMP_1230

2023-01-04 Thread GitBox
itholic commented on code in PR #39282: URL: https://github.com/apache/spark/pull/39282#discussion_r1062069986 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala: ## @@ -680,6 +681,18 @@ class QueryCompilationErrorsSuite context =

[GitHub] [spark] techaddict commented on a diff in pull request #39393: [SPARK-41871][CONNECT] DataFrame hint parameter can be str, list, float or int

2023-01-04 Thread GitBox
techaddict commented on code in PR #39393: URL: https://github.com/apache/spark/pull/39393#discussion_r1062062273 ## python/pyspark/sql/connect/dataframe.py: ## @@ -480,9 +480,10 @@ def to_jcols( def hint(self, name: str, *params: Any) -> "DataFrame": for param

[GitHub] [spark] ulysses-you commented on a diff in pull request #38163: [SPARK-40711][SQL] Add spill size metrics for window

2023-01-04 Thread GitBox
ulysses-you commented on code in PR #38163: URL: https://github.com/apache/spark/pull/38163#discussion_r1062060973 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala: ## @@ -337,6 +338,7 @@ case class WindowInPandasExec( if

[GitHub] [spark] HyukjinKwon commented on pull request #39393: [SPARK-41871][CONNECT] DataFrame hint parameter can be str, list, float or int

2023-01-04 Thread GitBox
HyukjinKwon commented on PR #39393: URL: https://github.com/apache/spark/pull/39393#issuecomment-1371708907 yeah that's fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39393: [SPARK-41871][CONNECT] DataFrame hint parameter can be str, list, float or int

2023-01-04 Thread GitBox
HyukjinKwon commented on code in PR #39393: URL: https://github.com/apache/spark/pull/39393#discussion_r1062060563 ## python/pyspark/sql/connect/dataframe.py: ## @@ -480,9 +480,10 @@ def to_jcols( def hint(self, name: str, *params: Any) -> "DataFrame": for param

[GitHub] [spark] ulysses-you commented on a diff in pull request #39277: [SPARK-41708][SQL] Pull v1write information to `WriteFiles`

2023-01-04 Thread GitBox
ulysses-you commented on code in PR #39277: URL: https://github.com/apache/spark/pull/39277#discussion_r1062052748 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/V1WritesHiveUtils.scala: ## @@ -105,4 +112,164 @@ trait V1WritesHiveUtils { .map(_ =>

[GitHub] [spark] techaddict commented on pull request #39393: [SPARK-41871][CONNECT] DataFrame hint parameter can be str, list, float or int

2023-01-04 Thread GitBox
techaddict commented on PR #39393: URL: https://github.com/apache/spark/pull/39393#issuecomment-1371689457 @HyukjinKwon After spending some time with this, looks like the change is much bigger Proto Message Hint expected parameters to be repeated literal

[GitHub] [spark] LuciferYang commented on a diff in pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
LuciferYang commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1062051944 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala: ## @@ -1007,6 +1004,36 @@ class SQLAppStatusListenerSuite extends

[GitHub] [spark] LuciferYang commented on a diff in pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
LuciferYang commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1062051333 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ui/AllExecutionsPageSuite.scala: ## @@ -178,3 +177,35 @@ class AllExecutionsPageSuite extends

[GitHub] [spark] LuciferYang commented on a diff in pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
LuciferYang commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1062050562 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala: ## @@ -1007,6 +1004,36 @@ class SQLAppStatusListenerSuite extends

[GitHub] [spark] LuciferYang commented on pull request #39357: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for StreamingQueryProgressWrapper

2023-01-04 Thread GitBox
LuciferYang commented on PR #39357: URL: https://github.com/apache/spark/pull/39357#issuecomment-1371679675 Thanks @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #39391: [SPARK-41883][BUILD] Upgrade dropwizard metrics 4.2.15

2023-01-04 Thread GitBox
LuciferYang commented on PR #39391: URL: https://github.com/apache/spark/pull/39391#issuecomment-1371678208 has been re-triggered the failed task -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on pull request #39226: [SPARK-41694][CORE] Isolate RocksDB path for Live UI and automatically cleanup when `SparkContext.stop()`

2023-01-04 Thread GitBox
LuciferYang commented on PR #39226: URL: https://github.com/apache/spark/pull/39226#issuecomment-1371675917 @gengliangwang has been re-triggered the failed task, should not related to this pr ~ -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AngersZhuuuu commented on pull request #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files

2023-01-04 Thread GitBox
AngersZh commented on PR #36700: URL: https://github.com/apache/spark/pull/36700#issuecomment-1371674024 ping @cloud-fan @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] panbingkun commented on a diff in pull request #39383: [SPARK-41780][SQL] Should throw INVALID_PARAMETER_VALUE when the parameters `regexp` in regexp_replace is invalid

2023-01-04 Thread GitBox
panbingkun commented on code in PR #39383: URL: https://github.com/apache/spark/pull/39383#discussion_r1062048545 ## sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala: ## @@ -663,4 +664,18 @@ class StringFunctionsSuite extends QueryTest with

[GitHub] [spark] gengliangwang commented on a diff in pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
gengliangwang commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1062043632 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ui/AllExecutionsPageSuite.scala: ## @@ -178,3 +177,35 @@ class AllExecutionsPageSuite extends

[GitHub] [spark] gengliangwang commented on a diff in pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
gengliangwang commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1062043632 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ui/AllExecutionsPageSuite.scala: ## @@ -178,3 +177,35 @@ class AllExecutionsPageSuite extends

[GitHub] [spark] gengliangwang commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2023-01-04 Thread GitBox
gengliangwang commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1062033642 ## sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceSuite.scala: ## @@ -82,6 +82,7 @@ object SqlResourceSuite { new

[GitHub] [spark] gengliangwang commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2023-01-04 Thread GitBox
gengliangwang commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1062032939 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -74,6 +74,7 @@ class SQLExecutionUIDataSerializer

[GitHub] [spark] gengliangwang commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2023-01-04 Thread GitBox
gengliangwang commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1062032402 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala: ## @@ -43,6 +43,8 @@ case class SparkListenerSQLAdaptiveSQLMetricUpdates(

[GitHub] [spark] gengliangwang commented on pull request #39226: [SPARK-41694][CORE] Isolate RocksDB path for Live UI and automatically cleanup when `SparkContext.stop()`

2023-01-04 Thread GitBox
gengliangwang commented on PR #39226: URL: https://github.com/apache/spark/pull/39226#issuecomment-1371644321 @LuciferYang The failed test doesn't seem related. Just to double confirm, could you retrigger it? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] zhengruifeng opened a new pull request, #39396: [SPARK-41825][CONNECT][PYTHON] Enable doctests related to `DataFrame.show`

2023-01-04 Thread GitBox
zhengruifeng opened a new pull request, #39396: URL: https://github.com/apache/spark/pull/39396 ### What changes were proposed in this pull request? enable a group of doctests ### Why are the changes needed? for test coverage ### Does this PR introduce _any_

[GitHub] [spark] ulysses-you commented on a diff in pull request #39277: [SPARK-41708][SQL] Pull v1write information to `WriteFiles`

2023-01-04 Thread GitBox
ulysses-you commented on code in PR #39277: URL: https://github.com/apache/spark/pull/39277#discussion_r1062020277 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala: ## @@ -53,13 +59,17 @@ case class WriteFiles(child: LogicalPlan) extends

[GitHub] [spark] zhengruifeng commented on pull request #39388: [SPARK-41354][CONNECT][PYTHON] implement RepartitionByExpression

2023-01-04 Thread GitBox
zhengruifeng commented on PR #39388: URL: https://github.com/apache/spark/pull/39388#issuecomment-1371624171 just to confirm, the proto `RepartitionByExpression repartition_by_expression = 27` can support both `def repartition(self, *cols: "ColumnOrName")` `def

[GitHub] [spark] zhengruifeng commented on pull request #39378: [SPARK-41821][CONNECT][PYTHON] Fix doc test for DataFrame.describe

2023-01-04 Thread GitBox
zhengruifeng commented on PR #39378: URL: https://github.com/apache/spark/pull/39378#issuecomment-1371622584 > Shall we fix `TODO(SPARK-41821): Fix DataFrame.describe` below? You can remove: > > ``` > # TODO(SPARK-41821): Fix DataFrame.describe > del

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
dongjoon-hyun commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1062012515 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala: ## @@ -1007,6 +1004,36 @@ class SQLAppStatusListenerSuite extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
dongjoon-hyun commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1062012227 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala: ## @@ -1007,6 +1004,36 @@ class SQLAppStatusListenerSuite extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
dongjoon-hyun commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1062012012 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ui/AllExecutionsPageSuite.scala: ## @@ -178,3 +177,35 @@ class AllExecutionsPageSuite extends

[GitHub] [spark] xkrogen commented on pull request #38660: [SPARK-40199][SQL][WIP] Provide useful error when encountering null values in non-null fields

2023-01-04 Thread GitBox
xkrogen commented on PR #38660: URL: https://github.com/apache/spark/pull/38660#issuecomment-1371607231 Merged into latest master to resolve conflicts. @allisonwang-db or @cloud-fan , any thoughts/comments on the latest diff? Thanks! -- This is an automated message from the Apache Git

[GitHub] [spark] beliefer commented on pull request #39378: [SPARK-41821][CONNECT][PYTHON] Fix doc test for DataFrame.describe

2023-01-04 Thread GitBox
beliefer commented on PR #39378: URL: https://github.com/apache/spark/pull/39378#issuecomment-1371606490 ping @zhengruifeng cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] tedyu commented on pull request #39395: [SQL] Use foldLeft for DeduplicateRelations

2023-01-04 Thread GitBox
tedyu commented on PR #39395: URL: https://github.com/apache/spark/pull/39395#issuecomment-1371583867 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] tedyu opened a new pull request, #39395: [SQL] Use foldLeft for DeduplicateRelations

2023-01-04 Thread GitBox
tedyu opened a new pull request, #39395: URL: https://github.com/apache/spark/pull/39395 ### What changes were proposed in this pull request? This PR uses `foldLeft` in `DeduplicateRelations` for better performance. ### Why are the changes needed? `foldRight` is not as

[GitHub] [spark] rithwik-db commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
rithwik-db commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061992443 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061991762 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] rithwik-db commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
rithwik-db commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061991704 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] rithwik-db commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
rithwik-db commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061991285 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] rithwik-db commented on a diff in pull request #39146: [WIP][SPARK-41589][PYTHON][ML] PyTorch Distributor Baseline API Changes

2023-01-04 Thread GitBox
rithwik-db commented on code in PR #39146: URL: https://github.com/apache/spark/pull/39146#discussion_r1061990731 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,287 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] github-actions[bot] commented on pull request #36700: [SPARK-39318][SQL] Remove tpch-plan-stability WithStats golden files

2023-01-04 Thread GitBox
github-actions[bot] commented on PR #36700: URL: https://github.com/apache/spark/pull/36700#issuecomment-1371573628 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #36052: [SPARK-38777][YARN] Add `bin/spark-submit --kill / --status` support for yarn

2023-01-04 Thread GitBox
github-actions[bot] commented on PR #36052: URL: https://github.com/apache/spark/pull/36052#issuecomment-1371573647 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37899: [SPARK-40455][CORE]Abort result stage directly when it failed caused by FetchFailedException

2023-01-04 Thread GitBox
github-actions[bot] commented on PR #37899: URL: https://github.com/apache/spark/pull/37899#issuecomment-1371573567 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061987077 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061987077 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061985290 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061983056 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061979910 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061979807 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] HyukjinKwon closed pull request #39382: [SPARK-41878][CONNECT][TESTS] pyspark.sql.tests.test_dataframe - Add JIRAs or messages for skipped messages

2023-01-04 Thread GitBox
HyukjinKwon closed pull request #39382: [SPARK-41878][CONNECT][TESTS] pyspark.sql.tests.test_dataframe - Add JIRAs or messages for skipped messages URL: https://github.com/apache/spark/pull/39382 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HyukjinKwon commented on pull request #39382: [SPARK-41878][CONNECT][TESTS] pyspark.sql.tests.test_dataframe - Add JIRAs or messages for skipped messages

2023-01-04 Thread GitBox
HyukjinKwon commented on PR #39382: URL: https://github.com/apache/spark/pull/39382#issuecomment-1371556222 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #39386: [SPARK-41833][SPARK-41881][SPARK-41815][CONNECT][PYTHON] Make `DataFrame.collect` handle None/NaN/Array/Binary porperly

2023-01-04 Thread GitBox
HyukjinKwon closed pull request #39386: [SPARK-41833][SPARK-41881][SPARK-41815][CONNECT][PYTHON] Make `DataFrame.collect` handle None/NaN/Array/Binary porperly URL: https://github.com/apache/spark/pull/39386 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on pull request #39386: [SPARK-41833][SPARK-41881][SPARK-41815][CONNECT][PYTHON] Make `DataFrame.collect` handle None/NaN/Array/Binary porperly

2023-01-04 Thread GitBox
HyukjinKwon commented on PR #39386: URL: https://github.com/apache/spark/pull/39386#issuecomment-1371555616 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

  1   2   3   >