[GitHub] [spark] itholic commented on pull request #42953: [SPARK-45185][BUILD][PYTHON] Ignore type check for preventing unexpected linter failure

2023-09-15 Thread via GitHub
itholic commented on PR #42953: URL: https://github.com/apache/spark/pull/42953#issuecomment-1722146432 cc @zhenglaizhang @HyukjinKwon @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] itholic opened a new pull request, #42953: [SPARK-45185][BUILD][PYTHON] Ignore type check for preventing unexpected linter failure

2023-09-15 Thread via GitHub
itholic opened a new pull request, #42953: URL: https://github.com/apache/spark/pull/42953 ### What changes were proposed in this pull request? The current Python linter from CI is failing due to unexpected mypy check failure as below:

[GitHub] [spark] itholic commented on a diff in pull request #42793: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-15 Thread via GitHub
itholic commented on code in PR #42793: URL: https://github.com/apache/spark/pull/42793#discussion_r1327908926 ## python/pyspark/pandas/frame.py: ## @@ -1321,11 +1323,76 @@ def applymap(self, func: Callable[[Any], Any]) -> "DataFrame": 0 1.00 4.494400

[GitHub] [spark] itholic closed pull request #42946: [DO-NOT-MERGE] Test Jinja2 latest

2023-09-15 Thread via GitHub
itholic closed pull request #42946: [DO-NOT-MERGE] Test Jinja2 latest URL: https://github.com/apache/spark/pull/42946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] shuwang21 commented on a diff in pull request #42357: [SPARK-44306][YARN] Group FileStatus with few RPC calls within Yarn Client

2023-09-15 Thread via GitHub
shuwang21 commented on code in PR #42357: URL: https://github.com/apache/spark/pull/42357#discussion_r1327901950 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala: ## @@ -462,6 +462,30 @@ package object config extends Logging { .stringConf

[GitHub] [spark] panbingkun commented on pull request #42952: [SPARK-45184][SQL] Remove orphaned error class documents

2023-09-15 Thread via GitHub
panbingkun commented on PR #42952: URL: https://github.com/apache/spark/pull/42952#issuecomment-1722118084 cc @cloud-fan @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun opened a new pull request, #42952: [SPARK-45184][SQL] Remove orphaned error class documents

2023-09-15 Thread via GitHub
panbingkun opened a new pull request, #42952: URL: https://github.com/apache/spark/pull/42952 ### What changes were proposed in this pull request? The pr aims to remove orphaned error class documents, include: 1.Introducing an automated mechanism for removing orphaned files.

[GitHub] [spark] dongjoon-hyun commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1722105482 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-15 Thread via GitHub
dongjoon-hyun closed pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer URL: https://github.com/apache/spark/pull/42908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #42947: [SPARK-45181][BUILD] Upgrade buf to v1.26.1

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42947: URL: https://github.com/apache/spark/pull/42947#issuecomment-1722102594 Thank you, @zhengruifeng . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #42947: [SPARK-45181][BUILD] Upgrade buf to v1.26.1

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42947: URL: https://github.com/apache/spark/pull/42947#issuecomment-1722102560 Merged to master for Apache Spark 4.0.0. The only failure is a known one, `ReattachableExecuteSuite`. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] dongjoon-hyun closed pull request #42947: [SPARK-45181][BUILD] Upgrade buf to v1.26.1

2023-09-15 Thread via GitHub
dongjoon-hyun closed pull request #42947: [SPARK-45181][BUILD] Upgrade buf to v1.26.1 URL: https://github.com/apache/spark/pull/42947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #42918: [SPARK-40497][BUILD] Re-upgrade Scala to 2.13.11

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42918: URL: https://github.com/apache/spark/pull/42918#issuecomment-1722102166 Merged to master for Apache Spark 4.0.0. Thank you, @LuciferYang and all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun closed pull request #42918: [SPARK-40497][BUILD] Re-upgrade Scala to 2.13.11

2023-09-15 Thread via GitHub
dongjoon-hyun closed pull request #42918: [SPARK-40497][BUILD] Re-upgrade Scala to 2.13.11 URL: https://github.com/apache/spark/pull/42918 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #42926: [SPARK-45164][PS] Remove deprecated `Index` APIs

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42926: URL: https://github.com/apache/spark/pull/42926#issuecomment-1722101954 Merged to master. Thank you, @itholic and all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun closed pull request #42926: [SPARK-45164][PS] Remove deprecated `Index` APIs

2023-09-15 Thread via GitHub
dongjoon-hyun closed pull request #42926: [SPARK-45164][PS] Remove deprecated `Index` APIs URL: https://github.com/apache/spark/pull/42926 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #42935: [SPARK-45173][UI] Remove some unnecessary sourceMapping files in UI

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42935: URL: https://github.com/apache/spark/pull/42935#issuecomment-1722101732 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun closed pull request #42935: [SPARK-45173][UI] Remove some unnecessary sourceMapping files in UI

2023-09-15 Thread via GitHub
dongjoon-hyun closed pull request #42935: [SPARK-45173][UI] Remove some unnecessary sourceMapping files in UI URL: https://github.com/apache/spark/pull/42935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #42935: [SPARK-45173][UI] Remove some unnecessary sourceMapping files in UI

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42935: URL: https://github.com/apache/spark/pull/42935#issuecomment-1722101587 Thank you for the confirmation and updating the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] Hisoka-X commented on pull request #42951: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-15 Thread via GitHub
Hisoka-X commented on PR #42951: URL: https://github.com/apache/spark/pull/42951#issuecomment-1722088871 cc @cloud-fan @dongjoon-hyun @Daniel-Davies -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] Hisoka-X opened a new pull request, #42951: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-15 Thread via GitHub
Hisoka-X opened a new pull request, #42951: URL: https://github.com/apache/spark/pull/42951 ### What changes were proposed in this pull request? This PR fix call `array_insert` with different type between array and insert column, will throw exception. Sometimes it should be

[GitHub] [spark] zhengruifeng commented on pull request #42948: [SPARK-45166][PYTHON][FOLLOWUP] Delete unused `pyarrow_version_less_than_minimum` from `pyspark.sql.pandas.utils`

2023-09-15 Thread via GitHub
zhengruifeng commented on PR #42948: URL: https://github.com/apache/spark/pull/42948#issuecomment-1722080810 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #42948: [SPARK-45166][PYTHON][FOLLOWUP] Delete unused `pyarrow_version_less_than_minimum` from `pyspark.sql.pandas.utils`

2023-09-15 Thread via GitHub
zhengruifeng closed pull request #42948: [SPARK-45166][PYTHON][FOLLOWUP] Delete unused `pyarrow_version_less_than_minimum` from `pyspark.sql.pandas.utils` URL: https://github.com/apache/spark/pull/42948 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] github-actions[bot] closed pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-09-15 Thread via GitHub
github-actions[bot] closed pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates URL: https://github.com/apache/spark/pull/40128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] closed pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-09-15 Thread via GitHub
github-actions[bot] closed pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs URL: https://github.com/apache/spark/pull/41203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] ueshin commented on a diff in pull request #42793: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-15 Thread via GitHub
ueshin commented on code in PR #42793: URL: https://github.com/apache/spark/pull/42793#discussion_r132150 ## python/pyspark/pandas/frame.py: ## @@ -1321,11 +1323,76 @@ def applymap(self, func: Callable[[Any], Any]) -> "DataFrame": 0 1.00 4.494400

[GitHub] [spark] mayurdb commented on pull request #42950: [SPARK-45182][CORE] Ignore task completion from old stage after retrying indeterminate stages

2023-09-15 Thread via GitHub
mayurdb commented on PR #42950: URL: https://github.com/apache/spark/pull/42950#issuecomment-1721738630 @cloud-fan @caican00 can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] mayurdb opened a new pull request, #42950: [SPARK-45182][CORE] Ignore task completion from old stage after retrying indeterminate stages

2023-09-15 Thread via GitHub
mayurdb opened a new pull request, #42950: URL: https://github.com/apache/spark/pull/42950 ### What changes were proposed in this pull request? [SPARK-25342](https://issues.apache.org/jira/browse/SPARK-25342) Added a support for rolling back shuffle map stage so that all tasks of the

[GitHub] [spark] neilramaswamy commented on a diff in pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-15 Thread via GitHub
neilramaswamy commented on code in PR #42895: URL: https://github.com/apache/spark/pull/42895#discussion_r1327645005 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala: ## @@ -135,17 +135,15 @@ private[sql] class

[GitHub] [spark] sunchao commented on pull request #42612: [SPARK-44913][SQL] DS V2 supports push down V2 UDF that has magic method

2023-09-15 Thread via GitHub
sunchao commented on PR #42612: URL: https://github.com/apache/spark/pull/42612#issuecomment-1721607397 Apologies @ConeyLiu , just saw this PR. I think this makes sense. Could you rebase it? I'll review afterwards. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
allisonwang-db commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1327555626 ## python/pyspark/sql/functions.py: ## @@ -13041,6 +13041,120 @@ def json_object_keys(col: "ColumnOrName") -> Column: return

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42915: [SPARK-45159][PYTHON] Handle named arguments only when necessary

2023-09-15 Thread via GitHub
allisonwang-db commented on code in PR #42915: URL: https://github.com/apache/spark/pull/42915#discussion_r1327519954 ## python/pyspark/worker.py: ## @@ -810,28 +847,26 @@ def check_return_value(res): }, ) -def

[GitHub] [spark] vasa47 commented on pull request #41067: [SPARK-43496][KUBERNETES] Add configuration for pod memory limits

2023-09-15 Thread via GitHub
vasa47 commented on PR #41067: URL: https://github.com/apache/spark/pull/41067#issuecomment-1721506888 I need this feature. when can we expect this in main release branch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] juliuszsompolski commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-15 Thread via GitHub
juliuszsompolski commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1721461319 @LuciferYang I tried looking at https://github.com/apache/spark/pull/42560#issuecomment-1718968002 but did not reproduce it yet. If you have more instances of CI runs where it

[GitHub] [spark] juliuszsompolski commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-15 Thread via GitHub
juliuszsompolski commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1721459768 @dongjoon-hyun I don't think the SparkConnectSessionHolderSuite failures are related, and I don't know what's going on there. ``` Streaming foreachBatch worker is starting

[GitHub] [spark] cdkrot commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-15 Thread via GitHub
cdkrot commented on PR #42949: URL: https://github.com/apache/spark/pull/42949#issuecomment-1721446450 cc @HyukjinKwon, @nija-at -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cdkrot opened a new pull request, #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-15 Thread via GitHub
cdkrot opened a new pull request, #42949: URL: https://github.com/apache/spark/pull/42949 ### What changes were proposed in this pull request? Add error logging into `addArtifact` (see example in "How this is tested). The logging code is moved into separate file to avoid circular

[GitHub] [spark] yaooqinn commented on pull request #42935: [SPARK-45173][UI] Remove some unnecessary sourceMapping files in UI

2023-09-15 Thread via GitHub
yaooqinn commented on PR #42935: URL: https://github.com/apache/spark/pull/42935#issuecomment-1721425614 Yeah, the source map files are for debugging purposes which enables browsers to map JS/CSS created by a preprocessor back to the original source file. For production, we'd better not

[GitHub] [spark] sandip-db commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
sandip-db commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1327417666 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/FunctionTestSuite.scala: ## @@ -229,6 +229,18 @@ class FunctionTestSuite extends ConnectFunSuite

[GitHub] [spark] sandip-db commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
sandip-db commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1327401538 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -1821,6 +1821,106 @@ def test_json_functions(self):

[GitHub] [spark-connect-go] arnarpall commented on pull request #12: [SPARK-44141] Removed need to have buf preinstalled

2023-09-15 Thread via GitHub
arnarpall commented on PR #12: URL: https://github.com/apache/spark-connect-go/pull/12#issuecomment-1721362325 I seems like not all the changes to the workflow are not being reflected properly. The current failure. Output from the `internal/generated.out` target is for the run

[GitHub] [spark] LuciferYang commented on pull request #42918: [SPARK-40497][BUILD] Re-upgrade Scala to 2.13.11

2023-09-15 Thread via GitHub
LuciferYang commented on PR #42918: URL: https://github.com/apache/spark/pull/42918#issuecomment-1721331830 > Could you re-trigger the failed pipelines? Triggered -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] pan3793 commented on a diff in pull request #42599: [DO-NOT-MERGE] Remove Guava from shared classes from IsolatedClientLoader

2023-09-15 Thread via GitHub
pan3793 commented on code in PR #42599: URL: https://github.com/apache/spark/pull/42599#discussion_r1327275616 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala: ## @@ -130,8 +130,7 @@ private[hive] object IsolatedClientLoader extends

[GitHub] [spark] dongjoon-hyun commented on pull request #42935: [SPARK-45173][UI] Remove some unnecessary sourceMapping files in UI

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42935: URL: https://github.com/apache/spark/pull/42935#issuecomment-1721244501 If possible, please elaborate a little more in the PR description, @yaooqinn . :) -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] dongjoon-hyun commented on pull request #42941: [SPARK-43874][FOLLOWUP][TESTS] Enable `GroupbyIndexTests.test_groupby_multiindex_columns`

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42941: URL: https://github.com/apache/spark/pull/42941#issuecomment-1721225088 Merged to master. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun closed pull request #42941: [SPARK-43874][FOLLOWUP][TESTS] Enable `GroupbyIndexTests.test_groupby_multiindex_columns`

2023-09-15 Thread via GitHub
dongjoon-hyun closed pull request #42941: [SPARK-43874][FOLLOWUP][TESTS] Enable `GroupbyIndexTests.test_groupby_multiindex_columns` URL: https://github.com/apache/spark/pull/42941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun commented on pull request #42943: [SPARK-45175][K8S] download krb5.conf from remote storage in spark-submit on k8s

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42943: URL: https://github.com/apache/spark/pull/42943#issuecomment-1721220006 I have the same question with @yaooqinn . Since this is in `Security` domain, I'm wondering if this is safe or a recommended way for Kerberos. -- This is an automated message

[GitHub] [spark] dongjoon-hyun commented on pull request #42918: [SPARK-40497][BUILD] Re-upgrade Scala to 2.13.11

2023-09-15 Thread via GitHub
dongjoon-hyun commented on PR #42918: URL: https://github.com/apache/spark/pull/42918#issuecomment-1721203298 Could you re-trigger the failed pipelines? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on pull request #42944: [SPARK-45179][PYTHON] Increase Numpy minimum version to 1.21

2023-09-15 Thread via GitHub
zhengruifeng commented on PR #42944: URL: https://github.com/apache/spark/pull/42944#issuecomment-1721199404 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #42944: [SPARK-45179][PYTHON] Increase Numpy minimum version to 1.21

2023-09-15 Thread via GitHub
zhengruifeng closed pull request #42944: [SPARK-45179][PYTHON] Increase Numpy minimum version to 1.21 URL: https://github.com/apache/spark/pull/42944 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng commented on pull request #42948: [SPARK-45166][PYTHON][FOLLOWUP] Delete unused `pyarrow_version_less_than_minimum` from `pyspark.sql.pandas.utils`

2023-09-15 Thread via GitHub
zhengruifeng commented on PR #42948: URL: https://github.com/apache/spark/pull/42948#issuecomment-1721155323 CI link: https://github.com/zhengruifeng/spark/actions/runs/6195005123/job/16818927706 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] zhengruifeng opened a new pull request, #42948: [SPARK-45166][PYTHON][FOLLOWUP] Delete unused `pyarrow_version_less_than_minimum` from `pyspark.sql.pandas.utils`

2023-09-15 Thread via GitHub
zhengruifeng opened a new pull request, #42948: URL: https://github.com/apache/spark/pull/42948 ### What changes were proposed in this pull request? Delete unused `pyarrow_version_less_than_minimum` from `pyspark.sql.pandas.utils` ### Why are the changes needed? this method

[GitHub] [spark] zhengruifeng commented on pull request #42947: [SPARK-45181][BUILD] Upgrade buf to v1.26.1

2023-09-15 Thread via GitHub
zhengruifeng commented on PR #42947: URL: https://github.com/apache/spark/pull/42947#issuecomment-1721124704 I think we can continue our **monthly** upgrade of this package -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng opened a new pull request, #42947: [SPARK-45181][BUILD] Upgrade buf to v1.26.1

2023-09-15 Thread via GitHub
zhengruifeng opened a new pull request, #42947: URL: https://github.com/apache/spark/pull/42947 ### What changes were proposed in this pull request? Upgrade buf to v1.26.1 ### Why are the changes needed? this upgrade cause no change in generated codes it fixed multiple

[GitHub] [spark] panbingkun commented on pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-15 Thread via GitHub
panbingkun commented on PR #42917: URL: https://github.com/apache/spark/pull/42917#issuecomment-1721070239 I have checked all UT to display prompts for what should be prompted and not for what should not be prompted. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] panbingkun commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-15 Thread via GitHub
panbingkun commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1327126016 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -4715,8 +4714,7 @@ class AstBuilder extends DataTypeAstBuilder with

[GitHub] [spark] panbingkun commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-15 Thread via GitHub
panbingkun commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1327125546 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -976,7 +976,7 @@ class HiveDDLSuite exception =

[GitHub] [spark] panbingkun commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-15 Thread via GitHub
panbingkun commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1327124355 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -976,7 +976,7 @@ class HiveDDLSuite exception =

[GitHub] [spark] panbingkun commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-15 Thread via GitHub
panbingkun commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1327119760 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -3215,11 +3225,6 @@ " is a VARIABLE and cannot be updated using the SET statement.

[GitHub] [spark] yaooqinn closed pull request #42904: [SPARK-45151][CORE][UI] Task Level Thread Dump Support

2023-09-15 Thread via GitHub
yaooqinn closed pull request #42904: [SPARK-45151][CORE][UI] Task Level Thread Dump Support URL: https://github.com/apache/spark/pull/42904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] yaooqinn commented on pull request #42904: [SPARK-45151][CORE][UI] Task Level Thread Dump Support

2023-09-15 Thread via GitHub
yaooqinn commented on PR #42904: URL: https://github.com/apache/spark/pull/42904#issuecomment-1721030343 The second last commits passed CI and the last is a minor. Thanks @mridulm and @dongjoon-hyun for the review Merged to master -- This is an automated message from the

[GitHub] [spark] yaooqinn commented on pull request #42943: [SPARK-45175][K8S] download krb5.conf from remote storage in spark-submit on k8s

2023-09-15 Thread via GitHub
yaooqinn commented on PR #42943: URL: https://github.com/apache/spark/pull/42943#issuecomment-1721015896 What if the remote storage requires login via Kerberos before accessing it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dcoliversun commented on pull request #42943: [SPARK-45175][K8S] download krb5.conf from remote storage in spark-submit on k8s

2023-09-15 Thread via GitHub
dcoliversun commented on PR #42943: URL: https://github.com/apache/spark/pull/42943#issuecomment-1721010955 @dongjoon-hyun It would be good if you have time to review this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42929: [SPARK-45167][CONNECT] Python client must call `release_all`

2023-09-15 Thread via GitHub
HyukjinKwon commented on code in PR #42929: URL: https://github.com/apache/spark/pull/42929#discussion_r1327072112 ## python/pyspark/sql/connect/client/reattach.py: ## @@ -14,14 +14,16 @@ # See the License for the specific language governing permissions and # limitations

[GitHub] [spark] itholic commented on pull request #42946: [DO-NOT-METGE] Test Jinja2 latest

2023-09-15 Thread via GitHub
itholic commented on PR #42946: URL: https://github.com/apache/spark/pull/42946#issuecomment-1720947209 We need the latest version of `Jinja2` for some functions from Pandas 2.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] itholic opened a new pull request, #42946: [DO-NOT-METGE] Test Jinja2 latest

2023-09-15 Thread via GitHub
itholic opened a new pull request, #42946: URL: https://github.com/apache/spark/pull/42946 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] zhengruifeng commented on pull request #42942: [SPARK-45168][PYTHON][FOLLOWUP] `test_missing_data.py` Code Cleanup

2023-09-15 Thread via GitHub
zhengruifeng commented on PR #42942: URL: https://github.com/apache/spark/pull/42942#issuecomment-1720943164 thanks @dongjoon-hyun , merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng closed pull request #42942: [SPARK-45168][PYTHON][FOLLOWUP] `test_missing_data.py` Code Cleanup

2023-09-15 Thread via GitHub
zhengruifeng closed pull request #42942: [SPARK-45168][PYTHON][FOLLOWUP] `test_missing_data.py` Code Cleanup URL: https://github.com/apache/spark/pull/42942 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] peter-toth commented on pull request #42755: [SPARK-45034][SQL] Support deterministic mode function

2023-09-15 Thread via GitHub
peter-toth commented on PR #42755: URL: https://github.com/apache/spark/pull/42755#issuecomment-1720914122 Hmm, the failure seems unrelated but persistent... ``` [info] - client INVALID_CURSOR.DISCONNECTED error is retried when other RPC preempts this one *** FAILED *** (385

[GitHub] [spark] itholic commented on pull request #42890: [SPARK-25689][YARN][FOLLOWUP] Add a missing argument usage description for ApplicationMasterArguments

2023-09-15 Thread via GitHub
itholic commented on PR #42890: URL: https://github.com/apache/spark/pull/42890#issuecomment-1720903171 Actually I'm not very used to yarn cluster, so we might need a review from who has enough context for this (maybe @vanzin and @squito one of them?). Otherwise, can you give me a

[GitHub] [spark] beliefer commented on pull request #42861: [SPARK-45108][SQL] Improve the InjectRuntimeFilter for check probably shuffle

2023-09-15 Thread via GitHub
beliefer commented on PR #42861: URL: https://github.com/apache/spark/pull/42861#issuecomment-1720864374 ping @cloud-fan @viirya cc @somani -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #42937: [SPARK-45177][PS] Remove `col_space` parameter from `to_latex`

2023-09-15 Thread via GitHub
HyukjinKwon closed pull request #42937: [SPARK-45177][PS] Remove `col_space` parameter from `to_latex` URL: https://github.com/apache/spark/pull/42937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #42937: [SPARK-45177][PS] Remove `col_space` parameter from `to_latex`

2023-09-15 Thread via GitHub
HyukjinKwon commented on PR #42937: URL: https://github.com/apache/spark/pull/42937#issuecomment-1720812084 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Hisoka-X commented on pull request #42802: [SPARK-43752][SQL] Support default column value on DataSource V2

2023-09-15 Thread via GitHub
Hisoka-X commented on PR #42802: URL: https://github.com/apache/spark/pull/42802#issuecomment-1720796610 cc @cloud-fan Would you mind take a look this? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #42918: [SPARK-40497][BUILD] Re-upgrade Scala to 2.13.11

2023-09-15 Thread via GitHub
LuciferYang commented on PR #42918: URL: https://github.com/apache/spark/pull/42918#issuecomment-1720796029 > 2.13.12 has also been release https://github.com/scala/scala/releases/tag/v2.13.12 so we might want to look into that in the future. We can test it once Ammonite releases a

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42802: [SPARK-43752][SQL] Support default column value on DataSource V2

2023-09-15 Thread via GitHub
Hisoka-X commented on code in PR #42802: URL: https://github.com/apache/spark/pull/42802#discussion_r1326897921 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -139,8 +141,32 @@ class BasicInMemoryTableCatalog extends

[GitHub] [spark] LuciferYang commented on pull request #42918: [SPARK-40497][BUILD] Re-upgrade Scala to 2.13.11

2023-09-15 Thread via GitHub
LuciferYang commented on PR #42918: URL: https://github.com/apache/spark/pull/42918#issuecomment-1720792981 Thanks @dongjoon-hyun and @eejbyfeldt -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon closed pull request #42928: [SPARK-45166][PYTHON] Clean up unused code paths for pyarrow<4

2023-09-15 Thread via GitHub
HyukjinKwon closed pull request #42928: [SPARK-45166][PYTHON] Clean up unused code paths for pyarrow<4 URL: https://github.com/apache/spark/pull/42928 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #42928: [SPARK-45166][PYTHON] Clean up unused code paths for pyarrow<4

2023-09-15 Thread via GitHub
HyukjinKwon commented on PR #42928: URL: https://github.com/apache/spark/pull/42928#issuecomment-1720790062 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #42847: [SPARK-45128][SQL] Support `CalendarIntervalType` in Arrow

2023-09-15 Thread via GitHub
HyukjinKwon closed pull request #42847: [SPARK-45128][SQL] Support `CalendarIntervalType` in Arrow URL: https://github.com/apache/spark/pull/42847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] WeichenXu123 commented on pull request #42886: [SPARK-45129] Add pyspark "ml-connect" extras dependencies

2023-09-15 Thread via GitHub
WeichenXu123 commented on PR #42886: URL: https://github.com/apache/spark/pull/42886#issuecomment-1720788624 Thanks ! @HyukjinKwon When we run `pip install pyspark[ml-connect]` it should install pyspark[connect] dependencies too. -- This is an automated message from the Apache Git

[GitHub] [spark] HyukjinKwon commented on pull request #42847: [SPARK-45128][SQL] Support `CalendarIntervalType` in Arrow

2023-09-15 Thread via GitHub
HyukjinKwon commented on PR #42847: URL: https://github.com/apache/spark/pull/42847#issuecomment-1720788409 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] rickyma commented on pull request #42890: [SPARK-25689][YARN][FOLLOWUP] Add a missing argument usage description for ApplicationMasterArguments

2023-09-15 Thread via GitHub
rickyma commented on PR #42890: URL: https://github.com/apache/spark/pull/42890#issuecomment-1720787138 @itholic @HyukjinKwon Hey, can you guys merge this? This pull request doesn't need to be tested. Thanks a lot. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] HyukjinKwon commented on pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
HyukjinKwon commented on PR #42938: URL: https://github.com/apache/spark/pull/42938#issuecomment-1720785722 cc @itholic mind helping review this please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r132670 ## python/pyspark/sql/functions.py: ## @@ -13041,6 +13041,120 @@ def json_object_keys(col: "ColumnOrName") -> Column: return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1326888182 ## python/pyspark/sql/functions.py: ## @@ -13041,6 +13041,120 @@ def json_object_keys(col: "ColumnOrName") -> Column: return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1326888182 ## python/pyspark/sql/functions.py: ## @@ -13041,6 +13041,120 @@ def json_object_keys(col: "ColumnOrName") -> Column: return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1326886753 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -830,7 +830,11 @@ object FunctionRegistry { // csv

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1326885469 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -830,7 +830,11 @@ object FunctionRegistry { // csv

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1326885256 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -1821,6 +1821,106 @@ def test_json_functions(self):

[GitHub] [spark] itholic opened a new pull request, #42945: [SPARK-SPARK-45180][PS] Remove boolean inputs for `inclusive` parameter from `Series.between`

2023-09-15 Thread via GitHub
itholic opened a new pull request, #42945: URL: https://github.com/apache/spark/pull/42945 ### What changes were proposed in this pull request? This PR proposes to remove boolean inputs for `inclusive` parameter from `Series.between` in favor of `both` and `neither`

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1326882806 ## python/pyspark/sql/functions.py: ## @@ -13041,6 +13041,120 @@ def json_object_keys(col: "ColumnOrName") -> Column: return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-15 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1326881986 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -7367,15 +7367,83 @@ object functions { *

[GitHub] [spark] zhengruifeng opened a new pull request, #42944: [SPARK-45179][PYTHON] Increase Numpy minimum version to 1.21

2023-09-15 Thread via GitHub
zhengruifeng opened a new pull request, #42944: URL: https://github.com/apache/spark/pull/42944 ### What changes were proposed in this pull request? Increase Numpy minimum version to 1.21 ### Why are the changes needed? - according to the [release

[GitHub] [spark] dcoliversun opened a new pull request, #42943: [WIP][SPARK-45175][K8S] download krb5.conf from remote storage in spark-sumbit on k8s

2023-09-15 Thread via GitHub
dcoliversun opened a new pull request, #42943: URL: https://github.com/apache/spark/pull/42943 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] eejbyfeldt closed pull request #41943: [SPARK-44376][BUILD] Fix maven build using scala 2.13 and Java 11 or later

2023-09-15 Thread via GitHub
eejbyfeldt closed pull request #41943: [SPARK-44376][BUILD] Fix maven build using scala 2.13 and Java 11 or later URL: https://github.com/apache/spark/pull/41943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] eejbyfeldt commented on pull request #41943: [SPARK-44376][BUILD] Fix maven build using scala 2.13 and Java 11 or later

2023-09-15 Thread via GitHub
eejbyfeldt commented on PR #41943: URL: https://github.com/apache/spark/pull/41943#issuecomment-1720762469 Closing this as it no longer relevant. In the 3.5 scala 2.13 was downgraded and the update will be done in https://github.com/apache/spark/pull/42918 -- This is an automated message

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42929: [SPARK-45167][CONNECT] Python client must call `release_all`

2023-09-15 Thread via GitHub
HyukjinKwon commented on code in PR #42929: URL: https://github.com/apache/spark/pull/42929#discussion_r1326864836 ## python/pyspark/sql/tests/connect/client/test_client.py: ## @@ -147,15 +150,33 @@ def _stub_with(self, execute=None, attach=None):

[GitHub] [spark] viirya commented on pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-15 Thread via GitHub
viirya commented on PR #42936: URL: https://github.com/apache/spark/pull/42936#issuecomment-1720755579 Yea, the verbose logging messages could be an issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] viirya commented on a diff in pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-15 Thread via GitHub
viirya commented on code in PR #42936: URL: https://github.com/apache/spark/pull/42936#discussion_r1326863283 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -844,8 +845,8 @@ private[deploy] class Master( // We assign workers to each waiting

  1   2   >