[GitHub] [spark] Hisoka-X commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
Hisoka-X commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314466006 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,37 @@ abstract class JdbcDialect extends Serializable with Logging { s

[GitHub] [spark] cloud-fan commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314508461 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,38 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
Hisoka-X commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314507599 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,37 @@ abstract class JdbcDialect extends Serializable with Logging { s

[GitHub] [spark] panbingkun opened a new pull request, #42797: [SPARK-45068][SQL] Make function output column name consistent in case

2023-09-03 Thread via GitHub
panbingkun opened a new pull request, #42797: URL: https://github.com/apache/spark/pull/42797 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
Hisoka-X commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314497081 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,37 @@ abstract class JdbcDialect extends Serializable with Logging { s

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
Hisoka-X commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314497081 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,37 @@ abstract class JdbcDialect extends Serializable with Logging { s

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
Hisoka-X commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314497081 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,37 @@ abstract class JdbcDialect extends Serializable with Logging { s

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
Hisoka-X commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314497081 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,37 @@ abstract class JdbcDialect extends Serializable with Logging { s

[GitHub] [spark] LuciferYang commented on pull request #42796: [SPARK-45067][BUILD] Upgrade slf4j to 2.0.9

2023-09-03 Thread via GitHub
LuciferYang commented on PR #42796: URL: https://github.com/apache/spark/pull/42796#issuecomment-1704693024 test first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] LuciferYang opened a new pull request, #42796: [SPARK-45067][BUILD] Upgrade slf4j to 2.0.9

2023-09-03 Thread via GitHub
LuciferYang opened a new pull request, #42796: URL: https://github.com/apache/spark/pull/42796 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] cloud-fan commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314481185 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,37 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] LuciferYang commented on pull request #42795: [SPARK-45042][BUILD][3.5] Upgrade jetty to 9.4.52.v20230823

2023-09-03 Thread via GitHub
LuciferYang commented on PR #42795: URL: https://github.com/apache/spark/pull/42795#issuecomment-1704662375 cc @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] panbingkun commented on pull request #42795: [SPARK-45042][BUILD][3.5] Upgrade jetty to 9.4.52.v20230823

2023-09-03 Thread via GitHub
panbingkun commented on PR #42795: URL: https://github.com/apache/spark/pull/42795#issuecomment-1704661116 cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] panbingkun opened a new pull request, #42795: [SPARK-45042][BUILD][3.5] Upgrade jetty to 9.4.52.v20230823

2023-09-03 Thread via GitHub
panbingkun opened a new pull request, #42795: URL: https://github.com/apache/spark/pull/42795 ### What changes were proposed in this pull request? The pr aims to Upgrade jetty from 9.4.51.v20230217 to 9.4.52.v20230823. (Backport to Spark 3.5.0) ### Why are the changes needed? -

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
Hisoka-X commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314466479 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -529,6 +560,16 @@ abstract class JdbcDialect extends Serializable with Logging { }

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
Hisoka-X commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314466006 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,37 @@ abstract class JdbcDialect extends Serializable with Logging { s

[GitHub] [spark] cloud-fan commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314462378 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,37 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] cloud-fan commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314462193 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -529,6 +560,16 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] cloud-fan commented on a diff in pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #41855: URL: https://github.com/apache/spark/pull/41855#discussion_r1314461546 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -180,6 +180,37 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] cloud-fan commented on a diff in pull request #42752: [SPARK-45033][SQL] Support maps by parameterized `sql()`

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #42752: URL: https://github.com/apache/spark/pull/42752#discussion_r1314459460 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -96,7 +96,11 @@ case class PosParameterizedQuery(child: LogicalPlan, arg

[GitHub] [spark] panbingkun commented on pull request #42761: [SPARK-45042][BUILD] Upgrade jetty to 9.4.52.v20230823

2023-09-03 Thread via GitHub
panbingkun commented on PR #42761: URL: https://github.com/apache/spark/pull/42761#issuecomment-1704645884 > Merged into master. There are conflicts with 3.5, could you please give a separate pr ? @panbingkun Sure, Let me do it now. -- This is an automated message from the Apache G

[GitHub] [spark] MaxGekk commented on a diff in pull request #42752: [SPARK-45033][SQL] Support maps by parameterized `sql()`

2023-09-03 Thread via GitHub
MaxGekk commented on code in PR #42752: URL: https://github.com/apache/spark/pull/42752#discussion_r1314453024 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -96,7 +96,11 @@ case class PosParameterizedQuery(child: LogicalPlan, args:

[GitHub] [spark] cloud-fan commented on a diff in pull request #42752: [SPARK-45033][SQL] Support maps by parameterized `sql()`

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #42752: URL: https://github.com/apache/spark/pull/42752#discussion_r1314446890 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -96,7 +96,11 @@ case class PosParameterizedQuery(child: LogicalPlan, arg

[GitHub] [spark] cloud-fan commented on a diff in pull request #42752: [SPARK-45033][SQL] Support maps by parameterized `sql()`

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #42752: URL: https://github.com/apache/spark/pull/42752#discussion_r1314446317 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -96,7 +96,11 @@ case class PosParameterizedQuery(child: LogicalPlan, arg

[GitHub] [spark] MaxGekk commented on a diff in pull request #42752: [SPARK-45033][SQL] Support maps by parameterized `sql()`

2023-09-03 Thread via GitHub
MaxGekk commented on code in PR #42752: URL: https://github.com/apache/spark/pull/42752#discussion_r1314446198 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -96,7 +96,11 @@ case class PosParameterizedQuery(child: LogicalPlan, args:

[GitHub] [spark] MaxGekk commented on a diff in pull request #42752: [SPARK-45033][SQL] Support maps by parameterized `sql()`

2023-09-03 Thread via GitHub
MaxGekk commented on code in PR #42752: URL: https://github.com/apache/spark/pull/42752#discussion_r131739 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -96,7 +96,11 @@ case class PosParameterizedQuery(child: LogicalPlan, args:

[GitHub] [spark] zhengruifeng opened a new pull request, #42794: [SPARK-45066][PYTHON][CONNECT] Make function `repeat` accept column-type `n`

2023-09-03 Thread via GitHub
zhengruifeng opened a new pull request, #42794: URL: https://github.com/apache/spark/pull/42794 ### What changes were proposed in this pull request? Make function `repeat` accept column-type `n` ### Why are the changes needed? 1. to follow this guide: https://github.com/

[GitHub] [spark] itholic commented on pull request #42793: [WIP][SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-03 Thread via GitHub
itholic commented on PR #42793: URL: https://github.com/apache/spark/pull/42793#issuecomment-1704604204 Since there are many features are [deprecated from Pandas 2.1.0](https://pandas.pydata.org/docs/whatsnew/v2.1.0.html#deprecations), let me investigate if there is any corresponding featur

[GitHub] [spark] itholic opened a new pull request, #42793: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-03 Thread via GitHub
itholic opened a new pull request, #42793: URL: https://github.com/apache/spark/pull/42793 ### What changes were proposed in this pull request? This PR proposes to support pandas 2.1.0 for PySpark. See [What's new in 2.1.0](https://pandas.pydata.org/docs/dev/whatsnew/v2.1.0.html)

[GitHub] [spark] cloud-fan commented on pull request #42777: [SPARK-45054][SQL] HiveExternalCatalog.listPartitions should restore partition statistics

2023-09-03 Thread via GitHub
cloud-fan commented on PR #42777: URL: https://github.com/apache/spark/pull/42777#issuecomment-1704598173 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] cloud-fan commented on a diff in pull request #42752: [SPARK-45033][SQL] Support maps by parameterized `sql()`

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #42752: URL: https://github.com/apache/spark/pull/42752#discussion_r1314426545 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -96,7 +96,11 @@ case class PosParameterizedQuery(child: LogicalPlan, arg

[GitHub] [spark] cloud-fan commented on a diff in pull request #42778: [SPARK-45055] [SQL] Do not transpose windows if they conflict on ORDER BY / PROJECT clauses

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #42778: URL: https://github.com/apache/spark/pull/42778#discussion_r1314425064 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1305,11 +1305,20 @@ object TransposeWindow extends Rule[LogicalPlan] {

[GitHub] [spark] sadikovi commented on pull request #42792: [SPARK-44940][SQL][3.4] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-03 Thread via GitHub
sadikovi commented on PR #42792: URL: https://github.com/apache/spark/pull/42792#issuecomment-1704587062 cc @cloud-fan @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] sadikovi commented on pull request #42790: [SPARK-44940][SQL][3.5] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-03 Thread via GitHub
sadikovi commented on PR #42790: URL: https://github.com/apache/spark/pull/42790#issuecomment-1704587030 cc @cloud-fan @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] sadikovi commented on pull request #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-03 Thread via GitHub
sadikovi commented on PR #42667: URL: https://github.com/apache/spark/pull/42667#issuecomment-1704586475 I have opened backport PRs (linked in this PR). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] sadikovi opened a new pull request, #42792: [SPARK-44940][SQL][3.4] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-03 Thread via GitHub
sadikovi opened a new pull request, #42792: URL: https://github.com/apache/spark/pull/42792 ### What changes were proposed in this pull request? Backport of https://github.com/apache/spark/pull/42667 to branch-3.4. The PR improves JSON parsing when `spark.sql.json.enablePartialR

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42791: [SPARK-45064][PYTHON][CONNECT] Add the missing `scale` parameter in `ceil/ceiling`

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42791: URL: https://github.com/apache/spark/pull/42791#discussion_r1314419285 ## python/pyspark/sql/connect/functions.py: ## @@ -552,15 +552,23 @@ def cbrt(col: "ColumnOrName") -> Column: cbrt.__doc__ = pysparkfuncs.cbrt.__doc__ -def ce

[GitHub] [spark] zhengruifeng opened a new pull request, #42791: [SPARK-45064][PYTHON][CONNECT] Add the missing `scale` parameter in `ceil/ceiling`

2023-09-03 Thread via GitHub
zhengruifeng opened a new pull request, #42791: URL: https://github.com/apache/spark/pull/42791 ### What changes were proposed in this pull request? Add the missing `scale` parameter in `ceil/ceiling` ### Why are the changes needed? for parity, this parameter existed in both

[GitHub] [spark] sadikovi opened a new pull request, #42790: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-03 Thread via GitHub
sadikovi opened a new pull request, #42790: URL: https://github.com/apache/spark/pull/42790 ### What changes were proposed in this pull request? Backport of https://github.com/apache/spark/pull/42667 to branch-3.5. The PR improves JSON parsing when `spark.sql.json.enablePartialR

[GitHub] [spark] LuciferYang commented on pull request #42761: [SPARK-45042][BUILD] Upgrade jetty to 9.4.52.v20230823

2023-09-03 Thread via GitHub
LuciferYang commented on PR #42761: URL: https://github.com/apache/spark/pull/42761#issuecomment-1704572375 Merged into master. There are conflicts with 3.5, could you please give a separate pr ? @panbingkun -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [spark] LuciferYang closed pull request #42761: [SPARK-45042][BUILD] Upgrade jetty to 9.4.52.v20230823

2023-09-03 Thread via GitHub
LuciferYang closed pull request #42761: [SPARK-45042][BUILD] Upgrade jetty to 9.4.52.v20230823 URL: https://github.com/apache/spark/pull/42761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] LuciferYang commented on pull request #42753: [SPARK-45032][CONNECT] Fix compilation warnings related to `Top-level wildcard is not allowed and will error under -Xsource:3`

2023-09-03 Thread via GitHub
LuciferYang commented on PR #42753: URL: https://github.com/apache/spark/pull/42753#issuecomment-1704568162 Thanks @HyukjinKwon @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] LuciferYang commented on pull request #42766: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

2023-09-03 Thread via GitHub
LuciferYang commented on PR #42766: URL: https://github.com/apache/spark/pull/42766#issuecomment-1704566844 @gengliangwang Do you still remember why changed `shadeTestJar` to false? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] LuciferYang commented on pull request #42598: [SPARK-44890][BUILD]Update miswritten remarks

2023-09-03 Thread via GitHub
LuciferYang commented on PR #42598: URL: https://github.com/apache/spark/pull/42598#issuecomment-1704566124 https://github.com/apache/spark/assets/1475305/913cfb25-6bab-4a33-ba73-60a2f4f4f43a";> Is there a problem with your GitHub Action configuration? Why does the GA page look like a

[GitHub] [spark] LuciferYang opened a new pull request, #42789: [SPARK-45063][PYTHON][DOCS] Refine docstring of `max_by/min_by`

2023-09-03 Thread via GitHub
LuciferYang opened a new pull request, #42789: URL: https://github.com/apache/spark/pull/42789 ### What changes were proposed in this pull request? This pr refine docstring of `max_by/min_by` and add some new examples. ### Why are the changes needed? To improve PySpark documentat

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

2023-09-03 Thread via GitHub
HyukjinKwon commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1314400083 ## python/pyspark/sql/dataframe.py: ## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. -

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

2023-09-03 Thread via GitHub
HyukjinKwon commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1314400083 ## python/pyspark/sql/dataframe.py: ## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. -

[GitHub] [spark] HyukjinKwon commented on pull request #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42667: URL: https://github.com/apache/spark/pull/42667#issuecomment-1704548723 Merged to master. It has some conflicts in branch-3.5 and 3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon closed pull request #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-03 Thread via GitHub
HyukjinKwon closed pull request #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled URL: https://github.com/apache/spark/pull/42667 -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [spark] itholic opened a new pull request, #42788: [SPARK-43291][PS] Generate proper warning on different behavior with `numeric_only`

2023-09-03 Thread via GitHub
itholic opened a new pull request, #42788: URL: https://github.com/apache/spark/pull/42788 ### What changes were proposed in this pull request? This PR added warning messages throughout the Pandas API on Spark wherever the `numeric_only` parameter is used with different default value.

[GitHub] [spark] sadikovi commented on pull request #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-03 Thread via GitHub
sadikovi commented on PR #42667: URL: https://github.com/apache/spark/pull/42667#issuecomment-1704544738 Shall we merge this or do you have any concerns or questions? I will be more than happy to answer them or follow up on the suggestions. We may also need to backport to Spark 3.5/3.

[GitHub] [spark] HeartSaVioR commented on pull request #42774: [SPARK-45045][SS][3.5] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

2023-09-03 Thread via GitHub
HeartSaVioR commented on PR #42774: URL: https://github.com/apache/spark/pull/42774#issuecomment-1704534176 Merged via [4ab0fc](https://github.com/apache/spark/commit/44ab0fc0068f815c7eddcd34ae4343bbfd97b64d) -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [spark] HeartSaVioR closed pull request #42774: [SPARK-45045][SS][3.5] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

2023-09-03 Thread via GitHub
HeartSaVioR closed pull request #42774: [SPARK-45045][SS][3.5] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183 URL: https://github.com/apache/spark/pull/42774 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] HeartSaVioR closed pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

2023-09-03 Thread via GitHub
HeartSaVioR closed pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183 URL: https://github.com/apache/spark/pull/42773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR commented on pull request #42774: [SPARK-45045][SS][3.5] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

2023-09-03 Thread via GitHub
HeartSaVioR commented on PR #42774: URL: https://github.com/apache/spark/pull/42774#issuecomment-1704533200 Thanks for reviewing! Merging to 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] HeartSaVioR commented on pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

2023-09-03 Thread via GitHub
HeartSaVioR commented on PR #42773: URL: https://github.com/apache/spark/pull/42773#issuecomment-1704533152 Thanks for reviewing! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1314388163 ## python/pyspark/sql/dataframe.py: ## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. -

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1314388163 ## python/pyspark/sql/dataframe.py: ## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. -

[GitHub] [spark] itholic opened a new pull request, #42787: [SPARK-43241][PS] `MultiIndex.append` not checking names for equality

2023-09-03 Thread via GitHub
itholic opened a new pull request, #42787: URL: https://github.com/apache/spark/pull/42787 ### What changes were proposed in this pull request? This PR proposes to fix the behavior of `MultiIndex.append` to do not checking names. ### Why are the changes needed? To match

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42783: [SPARK-45059][CONNECT][PYTHON] Add `try_reflect` functions to Scala and Python

2023-09-03 Thread via GitHub
Hisoka-X commented on code in PR #42783: URL: https://github.com/apache/spark/pull/42783#discussion_r1314377696 ## python/pyspark/sql/functions.py: ## @@ -15748,6 +15749,33 @@ def java_method(*cols: "ColumnOrName") -> Column: return _invoke_function_over_seq_of_columns("jav

[GitHub] [spark] zhengruifeng opened a new pull request, #42786: [SPARK-45052][SQL][PYTHON][CONNECT][3.5] Make function aliases output column name consistent with SQL

2023-09-03 Thread via GitHub
zhengruifeng opened a new pull request, #42786: URL: https://github.com/apache/spark/pull/42786 ### What changes were proposed in this pull request? backport https://github.com/apache/spark/pull/42775 to 3.5 ### Why are the changes needed? to make `func(col)` consistent with

[GitHub] [spark] ueshin opened a new pull request, #42785: [SPARK-44876][PYTHON][FOLLOWUP][3.5] Fix Arrow-optimized Python UDF to delay wrapping the function with fail_on_stopiteration

2023-09-03 Thread via GitHub
ueshin opened a new pull request, #42785: URL: https://github.com/apache/spark/pull/42785 ### What changes were proposed in this pull request? This is a backport of https://github.com/apache/spark/pull/42784. Fixes Arrow-optimized Python UDF to delay wrapping the function with

[GitHub] [spark] zhengruifeng commented on pull request #42775: [SPARK-45052][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL

2023-09-03 Thread via GitHub
zhengruifeng commented on PR #42775: URL: https://github.com/apache/spark/pull/42775#issuecomment-1704506729 merged to master, will send a separate PR for 3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] zhengruifeng closed pull request #42775: [SPARK-45052][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL

2023-09-03 Thread via GitHub
zhengruifeng closed pull request #42775: [SPARK-45052][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL URL: https://github.com/apache/spark/pull/42775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42775: [SPARK-45052][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42775: URL: https://github.com/apache/spark/pull/42775#discussion_r1314370284 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1052,15 +1049,15 @@ object functions { * @group agg_funcs * @since 3.5.0 */ -

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42783: [SPARK-45059][CONNECT][PYTHON] Add `try_reflect` functions to Scala and Python

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42783: URL: https://github.com/apache/spark/pull/42783#discussion_r1314365390 ## python/pyspark/sql/functions.py: ## @@ -15748,6 +15749,33 @@ def java_method(*cols: "ColumnOrName") -> Column: return _invoke_function_over_seq_of_columns(

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42775: [SPARK-45052][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL

2023-09-03 Thread via GitHub
HyukjinKwon commented on code in PR #42775: URL: https://github.com/apache/spark/pull/42775#discussion_r1314366796 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1052,15 +1049,15 @@ object functions { * @group agg_funcs * @since 3.5.0 */ - d

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42783: [SPARK-45059][CONNECT][PYTHON] Add `try_reflect` functions to Scala and Python

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42783: URL: https://github.com/apache/spark/pull/42783#discussion_r1314365390 ## python/pyspark/sql/functions.py: ## @@ -15748,6 +15749,33 @@ def java_method(*cols: "ColumnOrName") -> Column: return _invoke_function_over_seq_of_columns(

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42775: [SPARK-45052][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42775: URL: https://github.com/apache/spark/pull/42775#discussion_r1314358804 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1052,15 +1049,15 @@ object functions { * @group agg_funcs * @since 3.5.0 */ -

[GitHub] [spark] ueshin opened a new pull request, #42784: [SPARK-44876][PYTHON][FOLLOWUP] Fix Arrow-optimized Python UDF to delay wrapping the function with fail_on_stopiteration

2023-09-03 Thread via GitHub
ueshin opened a new pull request, #42784: URL: https://github.com/apache/spark/pull/42784 ### What changes were proposed in this pull request? Fixes Arrow-optimized Python UDF to delay wrapping the function with `fail_on_stopiteration`. Also removed unnecessary verification `ve

[GitHub] [spark] zekai-li commented on pull request #42529: [SPARK-44845][YARN][DEPLOY] Fix file system uri comparison function

2023-09-03 Thread via GitHub
zekai-li commented on PR #42529: URL: https://github.com/apache/spark/pull/42529#issuecomment-1704480296 @tgravescs take a check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] HyukjinKwon closed pull request #42782: [SPARK-45058][PYTHON][DOCS] Refine docstring of DataFrame.distinct

2023-09-03 Thread via GitHub
HyukjinKwon closed pull request #42782: [SPARK-45058][PYTHON][DOCS] Refine docstring of DataFrame.distinct URL: https://github.com/apache/spark/pull/42782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] HyukjinKwon commented on pull request #42782: [SPARK-45058][PYTHON][DOCS] Refine docstring of DataFrame.distinct

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42782: URL: https://github.com/apache/spark/pull/42782#issuecomment-1704462452 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #42776: [SPARK-45053][PYTHON][MINOR] Log improvement in python version mismatch

2023-09-03 Thread via GitHub
HyukjinKwon closed pull request #42776: [SPARK-45053][PYTHON][MINOR] Log improvement in python version mismatch URL: https://github.com/apache/spark/pull/42776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #42776: [SPARK-45053][PYTHON][MINOR] Log improvement in python version mismatch

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42776: URL: https://github.com/apache/spark/pull/42776#issuecomment-1704461253 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42775: [SPARK-45052][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42775: URL: https://github.com/apache/spark/pull/42775#discussion_r1314347624 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1052,15 +1049,15 @@ object functions { * @group agg_funcs * @since 3.5.0 */ -

[GitHub] [spark] Hisoka-X commented on pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
Hisoka-X commented on PR #41855: URL: https://github.com/apache/spark/pull/41855#issuecomment-1704461031 > Nah, let's don't add a new API to 3.5.0 at this moment. Got it, let me change since to 4.0.0 -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42771: [SPARK-45050][SQL][CONNECT] Improve error message for UNKNOWN io.grpc.StatusRuntimeException

2023-09-03 Thread via GitHub
HyukjinKwon commented on code in PR #42771: URL: https://github.com/apache/spark/pull/42771#discussion_r1314347699 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -107,7 +107,7 @@ private[client] object GrpcExcep

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42775: [SPARK-45052][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42775: URL: https://github.com/apache/spark/pull/42775#discussion_r1314347624 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1052,15 +1049,15 @@ object functions { * @group agg_funcs * @since 3.5.0 */ -

[GitHub] [spark] HyukjinKwon closed pull request #42768: [SPARK-44667][INFRA][FOLLOWUP] Uninstall `deepspeed` libraries for non-ML jobs

2023-09-03 Thread via GitHub
HyukjinKwon closed pull request #42768: [SPARK-44667][INFRA][FOLLOWUP] Uninstall `deepspeed` libraries for non-ML jobs URL: https://github.com/apache/spark/pull/42768 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] HyukjinKwon commented on pull request #42768: [SPARK-44667][INFRA][FOLLOWUP] Uninstall `deepspeed` libraries for non-ML jobs

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42768: URL: https://github.com/apache/spark/pull/42768#issuecomment-1704458951 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42775: [SPARK-45052][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42775: URL: https://github.com/apache/spark/pull/42775#discussion_r1314346571 ## python/pyspark/sql/functions.py: ## @@ -2385,25 +2416,54 @@ def signum(col: "ColumnOrName") -> Column: Examples ->>> df = spark.range(1

[GitHub] [spark] HyukjinKwon commented on pull request #42767: [SPARK-45047][PYTHON][CONNECT] `DataFrame.groupBy` support ordinals

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42767: URL: https://github.com/apache/spark/pull/42767#issuecomment-1704458536 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #42766: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42766: URL: https://github.com/apache/spark/pull/42766#issuecomment-1704458471 cc @gengliangwang FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [spark] HyukjinKwon closed pull request #42758: [SPARK-45038][PYTHON][DOCS] Refine docstring of `max`

2023-09-03 Thread via GitHub
HyukjinKwon closed pull request #42758: [SPARK-45038][PYTHON][DOCS] Refine docstring of `max` URL: https://github.com/apache/spark/pull/42758 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] HyukjinKwon commented on pull request #42758: [SPARK-45038][PYTHON][DOCS] Refine docstring of `max`

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42758: URL: https://github.com/apache/spark/pull/42758#issuecomment-1704457917 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #42753: [SPARK-45032][CONNECT] Fix compilation warnings related to `Top-level wildcard is not allowed and will error under -Xsource:3`

2023-09-03 Thread via GitHub
HyukjinKwon closed pull request #42753: [SPARK-45032][CONNECT] Fix compilation warnings related to `Top-level wildcard is not allowed and will error under -Xsource:3` URL: https://github.com/apache/spark/pull/42753 -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [spark] HyukjinKwon commented on pull request #42753: [SPARK-45032][CONNECT] Fix compilation warnings related to `Top-level wildcard is not allowed and will error under -Xsource:3`

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42753: URL: https://github.com/apache/spark/pull/42753#issuecomment-1704457593 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1314345886 ## python/pyspark/sql/dataframe.py: ## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. -

[GitHub] [spark] HyukjinKwon commented on pull request #41855: [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #41855: URL: https://github.com/apache/spark/pull/41855#issuecomment-1704455930 Nah, let's don't add a new API to 3.5.0 at this moment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] HyukjinKwon closed pull request #42687: [SPARK-45061][SS][CONNECT] Clean up Running python StreamingQueryLIstener processes when session expires

2023-09-03 Thread via GitHub
HyukjinKwon closed pull request #42687: [SPARK-45061][SS][CONNECT] Clean up Running python StreamingQueryLIstener processes when session expires URL: https://github.com/apache/spark/pull/42687 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] HyukjinKwon commented on pull request #42687: [SPARK-45061][SS][CONNECT] Clean up Running python StreamingQueryLIstener processes when session expires

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42687: URL: https://github.com/apache/spark/pull/42687#issuecomment-1704455336 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] WweiL commented on pull request #42687: [SPARK-45061][SS][CONNECT] Clean up Running python StreamingQueryLIstener processes when session expires

2023-09-03 Thread via GitHub
WweiL commented on PR #42687: URL: https://github.com/apache/spark/pull/42687#issuecomment-1704454592 @HyukjinKwon Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on pull request #42687: [SPARK-44433][FOLLOWUP] Clean up Running python StreamingQueryLIstener processes when session expires

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42687: URL: https://github.com/apache/spark/pull/42687#issuecomment-1704453122 @WweiL mind creating a separate JIRA? SPARK-44433 has been landed to 3.5.0 already, and this won't be available in the same version as a followup. -- This is an automated message fr

[GitHub] [spark] HyukjinKwon commented on pull request #42687: [SPARK-44433][FOLLOWUP] Clean up Running python StreamingQueryLIstener processes when session expires

2023-09-03 Thread via GitHub
HyukjinKwon commented on PR #42687: URL: https://github.com/apache/spark/pull/42687#issuecomment-1704452754 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

2023-09-03 Thread via GitHub
HyukjinKwon commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1314343527 ## python/pyspark/sql/dataframe.py: ## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. -

[GitHub] [spark] github-actions[bot] closed pull request #40312: [SPARK-42695][SQL] Skew join handling in stream side of broadcast hash join

2023-09-03 Thread via GitHub
github-actions[bot] closed pull request #40312: [SPARK-42695][SQL] Skew join handling in stream side of broadcast hash join URL: https://github.com/apache/spark/pull/40312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t