[GitHub] [spark] zhengruifeng opened a new pull request, #42944: [SPARK-45179][PYTHON] Increase Numpy minimum version to 1.21

2023-09-14 Thread via GitHub
zhengruifeng opened a new pull request, #42944: URL: https://github.com/apache/spark/pull/42944 ### What changes were proposed in this pull request? Increase Numpy minimum version to 1.21 ### Why are the changes needed? - according to the [release history](https://pypi.o

[GitHub] [spark] dcoliversun opened a new pull request, #42943: [WIP][SPARK-45175][K8S] download krb5.conf from remote storage in spark-sumbit on k8s

2023-09-14 Thread via GitHub
dcoliversun opened a new pull request, #42943: URL: https://github.com/apache/spark/pull/42943 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] eejbyfeldt closed pull request #41943: [SPARK-44376][BUILD] Fix maven build using scala 2.13 and Java 11 or later

2023-09-14 Thread via GitHub
eejbyfeldt closed pull request #41943: [SPARK-44376][BUILD] Fix maven build using scala 2.13 and Java 11 or later URL: https://github.com/apache/spark/pull/41943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] eejbyfeldt commented on pull request #41943: [SPARK-44376][BUILD] Fix maven build using scala 2.13 and Java 11 or later

2023-09-14 Thread via GitHub
eejbyfeldt commented on PR #41943: URL: https://github.com/apache/spark/pull/41943#issuecomment-1720762469 Closing this as it no longer relevant. In the 3.5 scala 2.13 was downgraded and the update will be done in https://github.com/apache/spark/pull/42918 -- This is an automated message

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42929: [SPARK-45167][CONNECT] Python client must call `release_all`

2023-09-14 Thread via GitHub
HyukjinKwon commented on code in PR #42929: URL: https://github.com/apache/spark/pull/42929#discussion_r1326864836 ## python/pyspark/sql/tests/connect/client/test_client.py: ## @@ -147,15 +150,33 @@ def _stub_with(self, execute=None, attach=None): attach_ops=Respons

[GitHub] [spark] viirya commented on pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-14 Thread via GitHub
viirya commented on PR #42936: URL: https://github.com/apache/spark/pull/42936#issuecomment-1720755579 Yea, the verbose logging messages could be an issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] viirya commented on a diff in pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-14 Thread via GitHub
viirya commented on code in PR #42936: URL: https://github.com/apache/spark/pull/42936#discussion_r1326863283 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -844,8 +845,8 @@ private[deploy] class Master( // We assign workers to each waiting driv

[GitHub] [spark] zwangsheng closed pull request #40118: [SPARK-26365][K8S] In kuberentes cluster mode, spark submit should pass driver exit code

2023-09-14 Thread via GitHub
zwangsheng closed pull request #40118: [SPARK-26365][K8S] In kuberentes cluster mode, spark submit should pass driver exit code URL: https://github.com/apache/spark/pull/40118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] zhengruifeng commented on pull request #42942: [SPARK-45168][PYTHON][FOLLOWUP] `test_missing_data.py` Code Cleanup

2023-09-14 Thread via GitHub
zhengruifeng commented on PR #42942: URL: https://github.com/apache/spark/pull/42942#issuecomment-1720747786 thanks @dongjoon-hyun the CI link is https://github.com/zhengruifeng/spark/actions/runs/6194672684 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] zhengruifeng commented on pull request #42942: [SPARK-45168][PYTHON][FOLLOWUP] `test_missing_data.py` Code Cleanup

2023-09-14 Thread via GitHub
zhengruifeng commented on PR #42942: URL: https://github.com/apache/spark/pull/42942#issuecomment-1720745593 after this PR, ``` (spark_dev_310) ➜ spark git:(inc_pd_clean_up) ag --py 'pandas\.__version' python python/pyspark/sql/pandas/utils.py 37:if LooseVersion(pandas._

[GitHub] [spark] zhengruifeng opened a new pull request, #42942: [SPARK-45168][PYTHON][FOLLOWUP] Code Cleanup

2023-09-14 Thread via GitHub
zhengruifeng opened a new pull request, #42942: URL: https://github.com/apache/spark/pull/42942 ### What changes were proposed in this pull request? remove unreachable code path ### Why are the changes needed? code cleanup ### Does this PR introduce _any_ user-facing c

[GitHub] [spark] dongjoon-hyun commented on pull request #42930: [SPARK-45168][PYTHON] Increase Pandas minimum version to 1.4.4

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42930: URL: https://github.com/apache/spark/pull/42930#issuecomment-1720743852 No problem at all~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] itholic commented on pull request #42941: [WIP][SPARK-43874][FOLLOWUP][TESTS] Enable `GroupbyIndexTests.test_groupby_multiindex_columns`

2023-09-14 Thread via GitHub
itholic commented on PR #42941: URL: https://github.com/apache/spark/pull/42941#issuecomment-1720740081 Let me find some more test could be enabled while I'm here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] zhengruifeng commented on pull request #42930: [SPARK-45168][PYTHON] Increase Pandas minimum version to 1.4.4

2023-09-14 Thread via GitHub
zhengruifeng commented on PR #42930: URL: https://github.com/apache/spark/pull/42930#issuecomment-1720737302 oh, sorry, I find there are still some similar places to clean up, let me create a follow-up PR -- This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [spark] itholic opened a new pull request, #42941: [SPARK-43874][FOLLOWUP][TESTS] Enable `GroupbyIndexTests.test_groupby_multiindex_columns`

2023-09-14 Thread via GitHub
itholic opened a new pull request, #42941: URL: https://github.com/apache/spark/pull/42941 ### What changes were proposed in this pull request? Follow-up for https://github.com/apache/spark/pull/42533. ### Why are the changes needed? To enable test.

[GitHub] [spark] dongjoon-hyun commented on pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42936: URL: https://github.com/apache/spark/pull/42936#issuecomment-1720730875 For the log message, the log message could be very verbose because `schedule` method is invoked at every submission. For example, if we submit 500 jobs with the max limi 10, 4

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-14 Thread via GitHub
dongjoon-hyun commented on code in PR #42936: URL: https://github.com/apache/spark/pull/42936#discussion_r1326839780 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -844,8 +845,8 @@ private[deploy] class Master( // We assign workers to each waiti

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42904: [SPARK-45151][CORE][UI] Task Level Thread Dump Support

2023-09-14 Thread via GitHub
dongjoon-hyun commented on code in PR #42904: URL: https://github.com/apache/spark/pull/42904#discussion_r1326838058 ## core/src/main/scala/org/apache/spark/status/api/v1/OneApplicationResource.scala: ## @@ -172,6 +180,18 @@ private[v1] class AbstractApplicationResource extends

[GitHub] [spark] dongjoon-hyun commented on pull request #42929: [SPARK-45167][CONNECT] Python client must call `release_all`

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42929: URL: https://github.com/apache/spark/pull/42929#issuecomment-1720724824 Got it. Thank you for updating. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] grundprinzip commented on pull request #42929: [SPARK-45167][CONNECT] Python client must call `release_all`

2023-09-14 Thread via GitHub
grundprinzip commented on PR #42929: URL: https://github.com/apache/spark/pull/42929#issuecomment-1720723378 @HyukjinKwon @juliuszsompolski I fixed the issue that stemmed from the test shutting down the channel without cleaning up the threadpool of the generator used for the release request

[GitHub] [spark] viirya commented on a diff in pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-14 Thread via GitHub
viirya commented on code in PR #42936: URL: https://github.com/apache/spark/pull/42936#discussion_r1326828633 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -844,8 +845,8 @@ private[deploy] class Master( // We assign workers to each waiting driv

[GitHub] [spark] LuciferYang commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-14 Thread via GitHub
LuciferYang commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1720719483 Just to confirm, will the case mentioned by https://github.com/apache/spark/pull/42560#issuecomment-1718968002 also be fixed in this PR? -- This is an automated message fr

[GitHub] [spark] LuciferYang commented on pull request #42918: [SPARK-40497][BUILD] Re-upgrade Scala to 2.13.11

2023-09-14 Thread via GitHub
LuciferYang commented on PR #42918: URL: https://github.com/apache/spark/pull/42918#issuecomment-1720715681 cc @dongjoon-hyun FYI also cc @eejbyfeldt for double check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang commented on pull request #42918: [SPARK-40497][BUILD] Re-upgrade Scala to 2.13.11

2023-09-14 Thread via GitHub
LuciferYang commented on PR #42918: URL: https://github.com/apache/spark/pull/42918#issuecomment-1720714437 Only the `org.apache.spark.sql.connect.execution.ReattachableExecuteSuite` test failed, which is a known issue. -- This is an automated message from the Apache Git Service. To respo

[GitHub] [spark] zhengruifeng commented on pull request #42928: [SPARK-45166][PYTHON] Clean up unused code paths for pyarrow<4

2023-09-14 Thread via GitHub
zhengruifeng commented on PR #42928: URL: https://github.com/apache/spark/pull/42928#issuecomment-1720702469 > Could you re-trigger the failed PySpark pipeline? sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] grundprinzip commented on pull request #42929: [SPARK-45167][CONNECT] Python client must call `release_all`

2023-09-14 Thread via GitHub
grundprinzip commented on PR #42929: URL: https://github.com/apache/spark/pull/42929#issuecomment-1720700370 This is a warning, but not an error. The actual error is a bit further down: ``` AssertionError: 'java.nio.channels.ClosedChannelException[11325 chars]1)\n' != None ```

[GitHub] [spark] zhengruifeng commented on pull request #42930: [SPARK-45168][PYTHON] Increase Pandas minimum version to 1.4.4

2023-09-14 Thread via GitHub
zhengruifeng commented on PR #42930: URL: https://github.com/apache/spark/pull/42930#issuecomment-1720695226 thanks @dongjoon-hyun and @HyukjinKwon merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] HeartSaVioR commented on pull request #42940: [SPARK-45178][SS] Fallback to execute a single batch for Trigger.AvailableNow with unsupported sources rather than using wrapper

2023-09-14 Thread via GitHub
HeartSaVioR commented on PR #42940: URL: https://github.com/apache/spark/pull/42940#issuecomment-1720694800 cc. @zsxwing @brkyvz @viirya @anishshri-db Mind taking a look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhengruifeng closed pull request #42930: [SPARK-45168][PYTHON] Increase Pandas minimum version to 1.4.4

2023-09-14 Thread via GitHub
zhengruifeng closed pull request #42930: [SPARK-45168][PYTHON] Increase Pandas minimum version to 1.4.4 URL: https://github.com/apache/spark/pull/42930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] HeartSaVioR opened a new pull request, #42940: [SPARK-45178][SS] Fallback to execute a single batch for Trigger.AvailableNow with unsupported sources rather than using wrapper

2023-09-14 Thread via GitHub
HeartSaVioR opened a new pull request, #42940: URL: https://github.com/apache/spark/pull/42940 ### What changes were proposed in this pull request? This PR proposes to change the behavior when user runs streaming query with Trigger.AvailableNow, which query has any source which does n

[GitHub] [spark] zhengruifeng commented on pull request #42920: [SPARK-45143][PYTHON][CONNECT] Make PySpark compatible with PyArrow 13.0.0

2023-09-14 Thread via GitHub
zhengruifeng commented on PR #42920: URL: https://github.com/apache/spark/pull/42920#issuecomment-1720690499 thank you @dongjoon-hyun and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] dengziming opened a new pull request, #42939: SPARK-43254: Assign a name to the error _LEGACY_ERROR_TEMP_2018

2023-09-14 Thread via GitHub
dengziming opened a new pull request, #42939: URL: https://github.com/apache/spark/pull/42939 ### What changes were proposed in this pull request? Assign the name `CLASS_UNSUPPORTED_BY_MAP_OBJECTS` to the legacy error class `_LEGACY_ERROR_TEMP_2018`. ### Why are the changes nee

[GitHub] [spark] sandip-db opened a new pull request, #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-14 Thread via GitHub
sandip-db opened a new pull request, #42938: URL: https://github.com/apache/spark/pull/42938 ### What changes were proposed in this pull request? Add from_xml and schema_of_xml to pyspark, spark connect and sql function ### Why are the changes needed? from_xml parses XML data nes

[GitHub] [spark] itholic opened a new pull request, #42937: [SPARK-45177][PS] Remove `col_space` parameter from `to_latex`

2023-09-14 Thread via GitHub
itholic opened a new pull request, #42937: URL: https://github.com/apache/spark/pull/42937 ### What changes were proposed in this pull request? This PR proposes to remove `col_space` parameter from `DataFrame.to_latex` and `Series.to_latex` ### Why are the changes neede

[GitHub] [spark] dongjoon-hyun commented on pull request #42935: [SPARK-45173][UI] Remove some unnecessary sourceMapping files in UI

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42935: URL: https://github.com/apache/spark/pull/42935#issuecomment-1720620926 Ah, I missed that it's one line change. Got it. Thanks. ![Screenshot 2023-09-14 at 9 39 40  PM](https://github.com/apache/spark/assets/9700541/554dd6c0-23ce-4399-b413-e70486108

[GitHub] [spark] yaooqinn commented on pull request #42935: [SPARK-45173][UI] Remove some unnecessary sourceMapping files in UI

2023-09-14 Thread via GitHub
yaooqinn commented on PR #42935: URL: https://github.com/apache/spark/pull/42935#issuecomment-1720613868 Hi @dongjoon-hyun the sheet file `bootstrap.min.css` is not removed, but `bootstrap.min.css.map`. :) -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [spark] yaooqinn commented on a diff in pull request #42904: [SPARK-45151][CORE][UI] Task Level Thread Dump Support

2023-09-14 Thread via GitHub
yaooqinn commented on code in PR #42904: URL: https://github.com/apache/spark/pull/42904#discussion_r1326778557 ## core/src/main/scala/org/apache/spark/status/api/v1/OneApplicationResource.scala: ## @@ -52,18 +52,24 @@ private[v1] class AbstractApplicationResource extends BaseA

[GitHub] [spark] dongjoon-hyun commented on pull request #42793: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42793: URL: https://github.com/apache/spark/pull/42793#issuecomment-1720583572 Could you resolve the conflict, @itholic ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #42920: [SPARK-45143][PYTHON][CONNECT] Make PySpark compatible with PyArrow 13.0.0

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42920: URL: https://github.com/apache/spark/pull/42920#issuecomment-1720572501 Merged to master for Apache Spark 4.0.0. Thank you, @zhengruifeng and @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] dongjoon-hyun closed pull request #42920: [SPARK-45143][PYTHON][CONNECT] Make PySpark compatible with PyArrow 13.0.0

2023-09-14 Thread via GitHub
dongjoon-hyun closed pull request #42920: [SPARK-45143][PYTHON][CONNECT] Make PySpark compatible with PyArrow 13.0.0 URL: https://github.com/apache/spark/pull/42920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #42928: [SPARK-45166][PYTHON] Clean up unused code paths for pyarrow<4

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42928: URL: https://github.com/apache/spark/pull/42928#issuecomment-1720566853 Could you re-trigger the failed PySpark pipeline? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] dongjoon-hyun commented on pull request #42934: [SPARK-45172][BUILD] Upgrade `commons-compress` to 1.24.0

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42934: URL: https://github.com/apache/spark/pull/42934#issuecomment-1720560104 Merged to master for Apache Spark 4.0. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] dongjoon-hyun closed pull request #42934: [SPARK-45172][BUILD] Upgrade `commons-compress` to 1.24.0

2023-09-14 Thread via GitHub
dongjoon-hyun closed pull request #42934: [SPARK-45172][BUILD] Upgrade `commons-compress` to 1.24.0 URL: https://github.com/apache/spark/pull/42934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] HyukjinKwon commented on pull request #42933: [SPARK-45171][SQL] Initialize non-deterministic expressions in `GenerateExec`

2023-09-14 Thread via GitHub
HyukjinKwon commented on PR #42933: URL: https://github.com/apache/spark/pull/42933#issuecomment-1720554821 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] HyukjinKwon closed pull request #42933: [SPARK-45171][SQL] Initialize non-deterministic expressions in `GenerateExec`

2023-09-14 Thread via GitHub
HyukjinKwon closed pull request #42933: [SPARK-45171][SQL] Initialize non-deterministic expressions in `GenerateExec` URL: https://github.com/apache/spark/pull/42933 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun closed pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-14 Thread via GitHub
dongjoon-hyun closed pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers` URL: https://github.com/apache/spark/pull/42936 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] dongjoon-hyun commented on pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42936: URL: https://github.com/apache/spark/pull/42936#issuecomment-1720544303 `core` module test passed and the other Python Connector failures are known ones. Let me merge this. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42904: [SPARK-45151][CORE][UI] Task Level Thread Dump Support

2023-09-14 Thread via GitHub
dongjoon-hyun commented on code in PR #42904: URL: https://github.com/apache/spark/pull/42904#discussion_r1326772213 ## core/src/main/scala/org/apache/spark/status/api/v1/OneApplicationResource.scala: ## @@ -52,18 +52,24 @@ private[v1] class AbstractApplicationResource extends

[GitHub] [spark] dongjoon-hyun commented on pull request #42935: [SPARK-45173][UI] Remove some unnecessary sourceMapping files in UI

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42935: URL: https://github.com/apache/spark/pull/42935#issuecomment-1720520890 If this is really unused, I believe - We need to remove `UIUtils.commonHeaderNodes` first. - Also need to remove from RAT file https://github.com/apache/spark/blob/91ccc0f

[GitHub] [spark] dongjoon-hyun closed pull request #42927: [SPARK-45165][PS] Remove `inplace` parameter from `CategoricalIndex` APIs

2023-09-14 Thread via GitHub
dongjoon-hyun closed pull request #42927: [SPARK-45165][PS] Remove `inplace` parameter from `CategoricalIndex` APIs URL: https://github.com/apache/spark/pull/42927 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] yaooqinn commented on a diff in pull request #42904: [SPARK-45151][CORE][UI] Task Level Thread Dump Support

2023-09-14 Thread via GitHub
yaooqinn commented on code in PR #42904: URL: https://github.com/apache/spark/pull/42904#discussion_r1326735084 ## core/src/main/scala/org/apache/spark/ui/jobs/TaskThreadDumpPage.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [spark] itholic commented on pull request #42884: [SPARK-42304][FOLLOWUP][SQL] Add test for `GET_TABLES_BY_TYPE_UNSUPPORTED_BY_HIVE_VERSION`

2023-09-14 Thread via GitHub
itholic commented on PR #42884: URL: https://github.com/apache/spark/pull/42884#issuecomment-1720429390 Seems like the CI failure is not related to this change. Could you rebase to master? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] dongjoon-hyun commented on pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42936: URL: https://github.com/apache/spark/pull/42936#issuecomment-1720416057 Thank you so much, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] dongjoon-hyun commented on pull request #42920: [SPARK-45143][PYTHON][CONNECT] Make PySpark compatible with PyArrow 13.0.0

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42920: URL: https://github.com/apache/spark/pull/42920#issuecomment-1720412021 Oh, it's great to have mlflow 2.7.0 on time! Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun commented on pull request #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42936: URL: https://github.com/apache/spark/pull/42936#issuecomment-1720410633 Could you review this PR when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun opened a new pull request, #42936: [SPARK-45174][CORE] Support `spark.deploy.maxDrivers`

2023-09-14 Thread via GitHub
dongjoon-hyun opened a new pull request, #42936: URL: https://github.com/apache/spark/pull/42936 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

[GitHub] [spark] yaooqinn commented on pull request #42904: [SPARK-45151][CORE][UI] Task Level Thread Dump Support

2023-09-14 Thread via GitHub
yaooqinn commented on PR #42904: URL: https://github.com/apache/spark/pull/42904#issuecomment-1720406726 Thank you @mridulm, pending CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] yaooqinn opened a new pull request, #42935: [SPARK-45173][UI] Remove some unnecessary sourceMapping files in UI

2023-09-14 Thread via GitHub
yaooqinn opened a new pull request, #42935: URL: https://github.com/apache/spark/pull/42935 ### What changes were proposed in this pull request? This PR deletes *.[css|jss].map files in URL for sourceMapping ### Why are the changes needed? unnecessary file

[GitHub] [spark] chenyu-opensource commented on a diff in pull request #42919: [SPARK-45146][DOCS]Update the default value of 'spark.executor.logs.rolling.strategy'

2023-09-14 Thread via GitHub
chenyu-opensource commented on code in PR #42919: URL: https://github.com/apache/spark/pull/42919#discussion_r1326706268 ## docs/configuration.md: ## @@ -694,10 +694,10 @@ Apart from these, the following properties are also available, and may be useful spark.executor.log

[GitHub] [spark] HyukjinKwon closed pull request #42915: [SPARK-45159][PYTHON] Handle named arguments only when necessary

2023-09-14 Thread via GitHub
HyukjinKwon closed pull request #42915: [SPARK-45159][PYTHON] Handle named arguments only when necessary URL: https://github.com/apache/spark/pull/42915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #42915: [SPARK-45159][PYTHON] Handle named arguments only when necessary

2023-09-14 Thread via GitHub
HyukjinKwon commented on PR #42915: URL: https://github.com/apache/spark/pull/42915#issuecomment-172031 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #42886: [SPARK-45129] Add pyspark "ml-connect" extras dependencies

2023-09-14 Thread via GitHub
HyukjinKwon commented on PR #42886: URL: https://github.com/apache/spark/pull/42886#issuecomment-1720386537 yeah but let's hold off. I am working on restructuring the packages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [spark] LuciferYang commented on pull request #42921: [SPARK-45161][INFRA] Bump previousSparkVersion to 3.5.0

2023-09-14 Thread via GitHub
LuciferYang commented on PR #42921: URL: https://github.com/apache/spark/pull/42921#issuecomment-1720386531 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] HyukjinKwon commented on pull request #42793: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-14 Thread via GitHub
HyukjinKwon commented on PR #42793: URL: https://github.com/apache/spark/pull/42793#issuecomment-1720385772 Let's probably upgrade them since we're going ahead for 4.0.0 major version bumpup -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] itholic commented on pull request #42793: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-14 Thread via GitHub
itholic commented on PR #42793: URL: https://github.com/apache/spark/pull/42793#issuecomment-1720382320 @zhengruifeng AFAIK, there is no separate policy for minimum version. We may change the minimum version of a particular package when if an older version no longer works properly with Spar

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-14 Thread via GitHub
HeartSaVioR commented on code in PR #42895: URL: https://github.com/apache/spark/pull/42895#discussion_r1326687236 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -314,6 +314,18 @@ "" ] }, + "CANNOT_WRITE_STATE_FILE" : { Review Comment: How

[GitHub] [spark] itholic commented on pull request #42926: [SPARK-45164][PS] Remove deprecated `Index` APIs

2023-09-14 Thread via GitHub
itholic commented on PR #42926: URL: https://github.com/apache/spark/pull/42926#issuecomment-1720370230 Oh, I missed it. Just cleared the leftovers, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] srowen commented on a diff in pull request #42919: [SPARK-45146][DOCS]Update the default value of 'spark.executor.logs.rolling.strategy'

2023-09-14 Thread via GitHub
srowen commented on code in PR #42919: URL: https://github.com/apache/spark/pull/42919#discussion_r1326687425 ## docs/configuration.md: ## @@ -694,10 +694,10 @@ Apart from these, the following properties are also available, and may be useful spark.executor.logs.rolling.s

[GitHub] [spark] panbingkun commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge TABLE_OPERATION & _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_TABLE_OPERATION and refactor some logic

2023-09-14 Thread via GitHub
panbingkun commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1326684936 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -3215,11 +3225,6 @@ " is a VARIABLE and cannot be updated using the SET statement. Use

[GitHub] [spark] HeartSaVioR closed pull request #42822: [SPARK-45084][SS] StateOperatorProgress to use accurate effective shuffle partition number

2023-09-14 Thread via GitHub
HeartSaVioR closed pull request #42822: [SPARK-45084][SS] StateOperatorProgress to use accurate effective shuffle partition number URL: https://github.com/apache/spark/pull/42822 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] HeartSaVioR commented on pull request #42822: [SPARK-45084][SS] StateOperatorProgress to use accurate effective shuffle partition number

2023-09-14 Thread via GitHub
HeartSaVioR commented on PR #42822: URL: https://github.com/apache/spark/pull/42822#issuecomment-1720365462 Looks like the failure from org.apache.spark.sql.connect.execution.ReattachableExecuteSuite is fairly consistent, but doesn't see any relevant with the change. Thanks! Merging

[GitHub] [spark] chenyu-opensource commented on a diff in pull request #42919: [SPARK-45146][DOCS]Update the default value of 'spark.executor.logs.rolling.strategy'

2023-09-14 Thread via GitHub
chenyu-opensource commented on code in PR #42919: URL: https://github.com/apache/spark/pull/42919#discussion_r1326682581 ## docs/configuration.md: ## @@ -694,13 +694,13 @@ Apart from these, the following properties are also available, and may be useful spark.executor.log

[GitHub] [spark] zhengruifeng commented on pull request #42920: [SPARK-45143][PYTHON][CONNECT] Make PySpark compatible with PyArrow 13.0.0

2023-09-14 Thread via GitHub
zhengruifeng commented on PR #42920: URL: https://github.com/apache/spark/pull/42920#issuecomment-1720329365 @dongjoon-hyun I don't see failure in [docker build](https://github.com/zhengruifeng/spark/actions/runs/6184148073/job/16787288022) ``` #21 [15/19] RUN python3.9 -m pip inst

[GitHub] [spark] github-actions[bot] closed pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-09-14 Thread via GitHub
github-actions[bot] closed pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak URL: https://github.com/apache/spark/pull/41423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] github-actions[bot] commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-09-14 Thread via GitHub
github-actions[bot] commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1720315092 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-09-14 Thread via GitHub
github-actions[bot] commented on PR #41203: URL: https://github.com/apache/spark/pull/41203#issuecomment-1720315070 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #41459: [SPARK-40037][BUILD][3.3] Upgrade `Tink` to 1.7.0

2023-09-14 Thread via GitHub
github-actions[bot] closed pull request #41459: [SPARK-40037][BUILD][3.3] Upgrade `Tink` to 1.7.0 URL: https://github.com/apache/spark/pull/41459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark-docker] HyukjinKwon commented on pull request #55: [SPARK-45169] Add official image Dockerfile for Apache Spark 3.5.0

2023-09-14 Thread via GitHub
HyukjinKwon commented on PR #55: URL: https://github.com/apache/spark-docker/pull/55#issuecomment-1720305442 Awesome!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] HyukjinKwon commented on pull request #42779: [SPARK-45056][PYTHON][SS][CONNECT] Termination tests for streamingQueryListener and foreachBatch

2023-09-14 Thread via GitHub
HyukjinKwon commented on PR #42779: URL: https://github.com/apache/spark/pull/42779#issuecomment-1720305034 @WweiL mind creating a followup to fix them? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on pull request #42930: [SPARK-45168][PYTHON] Increase Pandas minimum version to 1.4.4

2023-09-14 Thread via GitHub
zhengruifeng commented on PR #42930: URL: https://github.com/apache/spark/pull/42930#issuecomment-1720301980 > Could you check the PySpark test failures? > > In addition to the style issue, > > ``` > flake8 checks failed: > ./python/pyspark/pandas/resample.py:35:1: F401 '

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42933: [SPARK-45171][SQL] Initialize non-deterministic expressions in `GenerateExec`

2023-09-14 Thread via GitHub
HyukjinKwon commented on code in PR #42933: URL: https://github.com/apache/spark/pull/42933#discussion_r1326624012 ## sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala: ## @@ -327,3 +331,4 @@ case class GenerateExec( override protected def withNewChild

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42920: [SPARK-45143][PYTHON][CONNECT] Make PySpark compatible with PyArrow 13.0.0

2023-09-14 Thread via GitHub
HyukjinKwon commented on code in PR #42920: URL: https://github.com/apache/spark/pull/42920#discussion_r1326623660 ## dev/infra/Dockerfile: ## @@ -85,7 +85,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht ENV R_LIBS_SITE "/usr/local/lib/R/

[GitHub] [spark] HyukjinKwon opened a new pull request, #42934: [SPARK-45172][BUILD] Upgrade commons-compress.version from 1.23.0 to 1.24.0

2023-09-14 Thread via GitHub
HyukjinKwon opened a new pull request, #42934: URL: https://github.com/apache/spark/pull/42934 ### What changes were proposed in this pull request? This PR proposes to uppgrade commons-compress.version from 1.23.0 to 1.24.0. ### Why are the changes needed? There are a bit

[GitHub] [spark] dependabot[bot] commented on pull request #42932: Bump org.apache.commons:commons-compress from 1.23.0 to 1.24.0

2023-09-14 Thread via GitHub
dependabot[bot] commented on PR #42932: URL: https://github.com/apache/spark/pull/42932#issuecomment-1720284464 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let

[GitHub] [spark] HyukjinKwon closed pull request #42932: Bump org.apache.commons:commons-compress from 1.23.0 to 1.24.0

2023-09-14 Thread via GitHub
HyukjinKwon closed pull request #42932: Bump org.apache.commons:commons-compress from 1.23.0 to 1.24.0 URL: https://github.com/apache/spark/pull/42932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] bersprockets opened a new pull request, #42933: [SPARK-45171][SQL] Initialize non-deterministic expressions in `GenerateExec`

2023-09-14 Thread via GitHub
bersprockets opened a new pull request, #42933: URL: https://github.com/apache/spark/pull/42933 ### What changes were proposed in this pull request? Before evaluating the generator function in `GenerateExec`, initialize non-deterministic expressions. ### Why are the changes nee

[GitHub] [spark] valentinp17 commented on a diff in pull request #42884: [SPARK-42304][FOLLOWUP][SQL] Add test for `GET_TABLES_BY_TYPE_UNSUPPORTED_BY_HIVE_VERSION`

2023-09-14 Thread via GitHub
valentinp17 commented on code in PR #42884: URL: https://github.com/apache/spark/pull/42884#discussion_r1326582226 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveShimSuite.scala: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] valentinp17 commented on a diff in pull request #42884: [SPARK-42304][FOLLOWUP][SQL] Add test for `GET_TABLES_BY_TYPE_UNSUPPORTED_BY_HIVE_VERSION`

2023-09-14 Thread via GitHub
valentinp17 commented on code in PR #42884: URL: https://github.com/apache/spark/pull/42884#discussion_r1326576833 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveShimSuite.scala: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] valentinp17 commented on a diff in pull request #42884: [SPARK-42304][FOLLOWUP][SQL] Add test for `GET_TABLES_BY_TYPE_UNSUPPORTED_BY_HIVE_VERSION`

2023-09-14 Thread via GitHub
valentinp17 commented on code in PR #42884: URL: https://github.com/apache/spark/pull/42884#discussion_r1326576833 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveShimSuite.scala: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] valentinp17 commented on a diff in pull request #42884: [SPARK-42304][FOLLOWUP][SQL] Add test for `GET_TABLES_BY_TYPE_UNSUPPORTED_BY_HIVE_VERSION`

2023-09-14 Thread via GitHub
valentinp17 commented on code in PR #42884: URL: https://github.com/apache/spark/pull/42884#discussion_r1326575472 ## .github/PULL_REQUEST_TEMPLATE: ## @@ -9,7 +9,7 @@ Thanks for sending a pull request! Here are some tips for you: 7. If you want to add a new configuration, p

[GitHub] [spark] dongjoon-hyun commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1720219199 It seems that another suite starts to fail. Is it related? ![Screenshot 2023-09-14 at 3 08 25  PM](https://github.com/apache/spark/assets/9700541/68bf3070-9f35-40db-95cb-c1171b

[GitHub] [spark] dongjoon-hyun closed pull request #42921: [SPARK-45161][INFRA] Bump previousSparkVersion to 3.5.0

2023-09-14 Thread via GitHub
dongjoon-hyun closed pull request #42921: [SPARK-45161][INFRA] Bump previousSparkVersion to 3.5.0 URL: https://github.com/apache/spark/pull/42921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #42921: [SPARK-45161][INFRA] Bump previousSparkVersion to 3.5.0

2023-09-14 Thread via GitHub
dongjoon-hyun commented on PR #42921: URL: https://github.com/apache/spark/pull/42921#issuecomment-1720214490 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] rangadi commented on a diff in pull request #42779: [SPARK-45056][PYTHON][SS][CONNECT] Termination tests for streamingQueryListener and foreachBatch

2023-09-14 Thread via GitHub
rangadi commented on code in PR #42779: URL: https://github.com/apache/spark/pull/42779#discussion_r1326514203 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingQueryListenerHelper.scala: ## @@ -26,15 +26,13 @@ import org.apache.spark.sql.s

[GitHub] [spark] dependabot[bot] opened a new pull request, #42932: Bump org.apache.commons:commons-compress from 1.23.0 to 1.24.0

2023-09-14 Thread via GitHub
dependabot[bot] opened a new pull request, #42932: URL: https://github.com/apache/spark/pull/42932 Bumps org.apache.commons:commons-compress from 1.23.0 to 1.24.0. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name

[GitHub] [spark] ueshin commented on pull request #42915: [SPARK-45159][PYTHON] Handle named arguments only when necessary

2023-09-14 Thread via GitHub
ueshin commented on PR #42915: URL: https://github.com/apache/spark/pull/42915#issuecomment-1720022792 cc @HyukjinKwon @zhengruifeng @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] siying commented on pull request #42822: [SPARK-45084][SS] StateOperatorProgress to use accurate effective shuffle partition number

2023-09-14 Thread via GitHub
siying commented on PR #42822: URL: https://github.com/apache/spark/pull/42822#issuecomment-1719945455 It's odd. Let me rebase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] MaxGekk opened a new pull request, #42931: [WIP][SPARK-45137][CONNECT] Support map/array parameters in parameterized `sql()`

2023-09-14 Thread via GitHub
MaxGekk opened a new pull request, #42931: URL: https://github.com/apache/spark/pull/42931 ### What changes were proposed in this pull request? ### Why are the changes needed? To fix the issue demonstrated by the example: ```scala spark.sql("select element_at(?, 1)", A

[GitHub] [spark] ueshin closed pull request #42874: [SPARK-45118][PYTHON] Refactor converters for complex types to short cut when the element types don't need converters

2023-09-14 Thread via GitHub
ueshin closed pull request #42874: [SPARK-45118][PYTHON] Refactor converters for complex types to short cut when the element types don't need converters URL: https://github.com/apache/spark/pull/42874 -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [spark] ueshin commented on pull request #42874: [SPARK-45118][PYTHON] Refactor converters for complex types to short cut when the element types don't need converters

2023-09-14 Thread via GitHub
ueshin commented on PR #42874: URL: https://github.com/apache/spark/pull/42874#issuecomment-1719881732 Thanks! merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

  1   2   >