[GitHub] [spark] beliefer commented on pull request #41436: [SPARK-43956][SQL] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
beliefer commented on PR #41436: URL: https://github.com/apache/spark/pull/41436#issuecomment-1574693752 @wangyum Do we need merge this to 3.3 and 3.4? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] venkateshbalaji99 commented on pull request #41199: [SPARK-43536][CORE] Fixing statsd sink reporter

2023-06-03 Thread via GitHub
venkateshbalaji99 commented on PR #41199: URL: https://github.com/apache/spark/pull/41199#issuecomment-1574926819 > Hi, @venkateshbalaji99 and @abmodi . This is a very old behavior since Apache Spark 2.3.0 (6 years). Could you elaborate which `count` metric did you have an issue

[GitHub] [spark] wangyum commented on pull request #41436: [SPARK-43956][SQL] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
wangyum commented on PR #41436: URL: https://github.com/apache/spark/pull/41436#issuecomment-1574698893 I'm +1 for backporting this to 3.4 and 3.3 if you want. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] beliefer commented on pull request #41421: [SPARK-43881][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listDatabases

2023-06-03 Thread via GitHub
beliefer commented on PR #41421: URL: https://github.com/apache/spark/pull/41421#issuecomment-1574767408 ping @cloud-fan @MaxGekk @HyukjinKwon @zhengruifeng cc @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] itholic commented on pull request #41437: [SPARK-43917][PS][INFRA] Upgrade `pandas` to 2.0.2

2023-06-03 Thread via GitHub
itholic commented on PR #41437: URL: https://github.com/apache/spark/pull/41437#issuecomment-1574785801 It looks pretty making sense to me since it's not introduce any extra test skipping or behavior changes. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] beliefer opened a new pull request, #41444: [SPARK-43916][SQL][PYTHON] Add percentile like functions to Scala and Python API

2023-06-03 Thread via GitHub
beliefer opened a new pull request, #41444: URL: https://github.com/apache/spark/pull/41444 ### What changes were proposed in this pull request? Based @HyukjinKwon 's suggestion, this PR want add percentile like functions to Scala and Python API. These functions show below.

[GitHub] [spark] beliefer opened a new pull request, #41446: [SPARK-43956][SQL][3.3] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
beliefer opened a new pull request, #41446: URL: https://github.com/apache/spark/pull/41446 ### What changes were proposed in this pull request? This PR used to backport https://github.com/apache/spark/pull/41436 to 3.3 ### Why are the changes needed? Fix the bug doesn't

[GitHub] [spark] panbingkun opened a new pull request, #41447: [SPARK-43957][SQL][TESTS] Use `checkError()` to check `Exception` in `*Insert*Suite`

2023-06-03 Thread via GitHub
panbingkun opened a new pull request, #41447: URL: https://github.com/apache/spark/pull/41447 ### What changes were proposed in this pull request? The pr aims to use `checkError()` to check `Exception` in `*Insert*Suite`, include: -

[GitHub] [spark] panbingkun commented on pull request #41447: [SPARK-43957][SQL][TESTS] Use `checkError()` to check `Exception` in `*Insert*Suite`

2023-06-03 Thread via GitHub
panbingkun commented on PR #41447: URL: https://github.com/apache/spark/pull/41447#issuecomment-1574908226 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] itholic commented on pull request #41437: [SPARK-43917][PS][INFRA] Upgrade `pandas` to 2.0.2

2023-06-03 Thread via GitHub
itholic commented on PR #41437: URL: https://github.com/apache/spark/pull/41437#issuecomment-1574781900 It's pretty makes sense to me although we still need to discuss the potential behavior changes for future updates. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] beliefer opened a new pull request, #41445: [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
beliefer opened a new pull request, #41445: URL: https://github.com/apache/spark/pull/41445 ### What changes were proposed in this pull request? This PR used to backport https://github.com/apache/spark/pull/41436 to 3.4 ### Why are the changes needed? Fix the bug doesn't

[GitHub] [spark] szehon-ho commented on pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-03 Thread via GitHub
szehon-ho commented on PR #41398: URL: https://github.com/apache/spark/pull/41398#issuecomment-1574901969 Thanks everyone for the warm welcome to Spark, and really fast reviews! As I'm out of town, I will look at any follow up improvements when I'm back. -- This is an automated

[GitHub] [spark] dtenedor commented on pull request #41191: [SPARK-43529][SQL] Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values

2023-06-03 Thread via GitHub
dtenedor commented on PR #41191: URL: https://github.com/apache/spark/pull/41191#issuecomment-1575087553 Note: the CI is actually passing, the pyspark failure is spurious/unrelated. https://github.com/apache/spark/assets/99207096/02176772-3e7c-4cbd-8308-297c6fd85066;> -- This

[GitHub] [spark] beliefer commented on pull request #41445: [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
beliefer commented on PR #41445: URL: https://github.com/apache/spark/pull/41445#issuecomment-1574967814 cc @wangyum @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] vinodkc commented on a diff in pull request #41144: [SPARK-43470][CORE] Add operating system ,Java, Python version information to application log

2023-06-03 Thread via GitHub
vinodkc commented on code in PR #41144: URL: https://github.com/apache/spark/pull/41144#discussion_r1215599942 ## core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala: ## @@ -106,7 +106,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( protected val

[GitHub] [spark] aokolnychyi opened a new pull request, #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-03 Thread via GitHub
aokolnychyi opened a new pull request, #41448: URL: https://github.com/apache/spark/pull/41448 ### What changes were proposed in this pull request? This PR adds `RewriteMergeIntoTable`, similar to `RewriteUpdateTable` and `RewriteDeleteFromTable`, to handle MERGE commands

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-03 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216251079 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala: ## @@ -0,0 +1,347 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-03 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216250498 ## core/src/main/resources/error/error-classes.json: ## @@ -1513,6 +1513,13 @@ "Parse Mode: . To process malformed records as null result, try setting the

[GitHub] [spark] aokolnychyi opened a new pull request, #41449: [SPARK-43959][SQL] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-03 Thread via GitHub
aokolnychyi opened a new pull request, #41449: URL: https://github.com/apache/spark/pull/41449 ### What changes were proposed in this pull request? This PR makes `RowLevelOperationSuiteBase` and `AlignAssignmentsSuite` abstract. ### Why are the changes needed?

[GitHub] [spark] aokolnychyi commented on pull request #41449: [SPARK-43959][SQL] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract

2023-06-03 Thread via GitHub
aokolnychyi commented on PR #41449: URL: https://github.com/apache/spark/pull/41449#issuecomment-1575408872 cc @cloud-fan @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-03 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216251279 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala: ## @@ -0,0 +1,347 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-03 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216252175 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala: ## @@ -0,0 +1,347 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-03 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216251890 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala: ## @@ -0,0 +1,347 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-03 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216251279 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala: ## @@ -0,0 +1,347 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-03 Thread via GitHub
aokolnychyi commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1216252708 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala: ## @@ -167,4 +183,36 @@ trait RewriteRowLevelCommand extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41144: [SPARK-43470][CORE] Add OS, Java, Python version information to application log

2023-06-03 Thread via GitHub
dongjoon-hyun commented on code in PR #41144: URL: https://github.com/apache/spark/pull/41144#discussion_r1216028837 ## core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala: ## @@ -106,6 +106,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT](

[GitHub] [spark] mcdull-zhang commented on a diff in pull request #41419: [SPARK-43911] [SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array

2023-06-03 Thread via GitHub
mcdull-zhang commented on code in PR #41419: URL: https://github.com/apache/spark/pull/41419#discussion_r1216118752 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryBroadcastExec.scala: ## @@ -93,7 +94,9 @@ case class SubqueryBroadcastExec( val rows =

[GitHub] [spark] aokolnychyi commented on a diff in pull request #41028: [SPARK-43324][SQL] Handle UPDATE commands for delta-based sources

2023-06-03 Thread via GitHub
aokolnychyi commented on code in PR #41028: URL: https://github.com/apache/spark/pull/41028#discussion_r1216227877 ## sql/core/src/test/scala/org/apache/spark/sql/connector/RowLevelOperationSuiteBase.scala: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] MaxGekk commented on pull request #41445: [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
MaxGekk commented on PR #41445: URL: https://github.com/apache/spark/pull/41445#issuecomment-1575139174 +1, LGTM. Merging to 3.4. Thank you, @beliefer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk closed pull request #41445: [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
MaxGekk closed pull request #41445: [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc] URL: https://github.com/apache/spark/pull/41445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on pull request #41447: [SPARK-43957][SQL][TESTS] Use `checkError()` to check `Exception` in `*Insert*Suite`

2023-06-03 Thread via GitHub
MaxGekk commented on PR #41447: URL: https://github.com/apache/spark/pull/41447#issuecomment-1575142470 +1, LGTM. Merging to master. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk closed pull request #41447: [SPARK-43957][SQL][TESTS] Use `checkError()` to check `Exception` in `*Insert*Suite`

2023-06-03 Thread via GitHub
MaxGekk closed pull request #41447: [SPARK-43957][SQL][TESTS] Use `checkError()` to check `Exception` in `*Insert*Suite` URL: https://github.com/apache/spark/pull/41447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] MaxGekk commented on a diff in pull request #41424: [SPARK-43913][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]

2023-06-03 Thread via GitHub
MaxGekk commented on code in PR #41424: URL: https://github.com/apache/spark/pull/41424#discussion_r1215752166 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -624,8 +626,8 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] dongjoon-hyun closed pull request #41441: [SPARK-43954][BUILD] Upgrade sbt from 1.8.3 to 1.9.0

2023-06-03 Thread via GitHub
dongjoon-hyun closed pull request #41441: [SPARK-43954][BUILD] Upgrade sbt from 1.8.3 to 1.9.0 URL: https://github.com/apache/spark/pull/41441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] github-actions[bot] closed pull request #40098: [SPARK-42504][SQL] NestedColumnAliasing support pruning adjacent projects

2023-06-03 Thread via GitHub
github-actions[bot] closed pull request #40098: [SPARK-42504][SQL] NestedColumnAliasing support pruning adjacent projects URL: https://github.com/apache/spark/pull/40098 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] github-actions[bot] closed pull request #39796: [SPARK-39800][SQL][WIP] DataSourceV2: View Support

2023-06-03 Thread via GitHub
github-actions[bot] closed pull request #39796: [SPARK-39800][SQL][WIP] DataSourceV2: View Support URL: https://github.com/apache/spark/pull/39796 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #41446: [SPARK-43956][SQL][3.3] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
dongjoon-hyun commented on PR #41446: URL: https://github.com/apache/spark/pull/41446#issuecomment-1575287357 All tests passed. Merged to branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #41446: [SPARK-43956][SQL][3.3] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
dongjoon-hyun commented on PR #41446: URL: https://github.com/apache/spark/pull/41446#issuecomment-1575287810 Thank you, @beliefer and @MaxGekk . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun closed pull request #41446: [SPARK-43956][SQL][3.3] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
dongjoon-hyun closed pull request #41446: [SPARK-43956][SQL][3.3] Fix the bug doesn't display column's sql for Percentile[Cont|Disc] URL: https://github.com/apache/spark/pull/41446 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk commented on pull request #41445: [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
MaxGekk commented on PR #41445: URL: https://github.com/apache/spark/pull/41445#issuecomment-1575140577 I have already found this https://github.com/apache/spark/pull/41446 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on pull request #41445: [SPARK-43956][SQL][3.4] Fix the bug doesn't display column's sql for Percentile[Cont|Disc]

2023-06-03 Thread via GitHub
MaxGekk commented on PR #41445: URL: https://github.com/apache/spark/pull/41445#issuecomment-1575140042 @beliefer Could you backport this to branch-3.3 since it is affected according to your ticket SPARK-43956, and Spark 3.3 is still supported officially. -- This is an automated message

[GitHub] [spark-connect-go] grundprinzip opened a new pull request, #8: [SPARK-43958] Adding support for Channel Builder

2023-06-03 Thread via GitHub
grundprinzip opened a new pull request, #8: URL: https://github.com/apache/spark-connect-go/pull/8 ### What changes were proposed in this pull request? Add support for parsing the connection string of Spark Connect in the same way was it's done for the other Spark Connect clients.

[GitHub] [spark] sarutak commented on pull request #41423: [SPARK-43523][CORE] Fix Spark UI LiveTask memory leak

2023-06-03 Thread via GitHub
sarutak commented on PR #41423: URL: https://github.com/apache/spark/pull/41423#issuecomment-1575207847 @aminebag This change seems ad-hocery and doesn't really address the issue. We need some more considerations to address this issue. Instead, how about setting a larger value to