[GitHub] [spark] HyukjinKwon opened a new pull request, #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

2022-08-19 Thread GitBox
HyukjinKwon opened a new pull request, #37582: URL: https://github.com/apache/spark/pull/37582 ### What changes were proposed in this pull request? This PR proposes to improve the examples in `pyspark.sql.session` by making each example self-contained with a brief explanation and a

[GitHub] [spark] wangyum commented on a diff in pull request #37565: [SPARK-40137][SQL] Combines limits after projection

2022-08-19 Thread GitBox
wangyum commented on code in PR #37565: URL: https://github.com/apache/spark/pull/37565#discussion_r950142433 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1994,6 +1994,13 @@ object EliminateLimits extends Rule[LogicalPlan] {

[GitHub] [spark] aray commented on a diff in pull request #37303: [SPARK-39883][SQL][TESTS] Add DataFrame function parity check

2022-08-19 Thread GitBox
aray commented on code in PR #37303: URL: https://github.com/apache/spark/pull/37303#discussion_r950158722 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala: ## @@ -41,6 +42,98 @@ import org.apache.spark.sql.types._ class DataFrameFunctionsSuite

[GitHub] [spark] Yikun opened a new pull request, #37584: [SPARK-39150][PS] Enable doctest which was disabled when pandas 1.4 upgrade

2022-08-19 Thread GitBox
Yikun opened a new pull request, #37584: URL: https://github.com/apache/spark/pull/37584 ### What changes were proposed in this pull request? Enable doctest which was disabled when pandas 1.4 upgrade ### Why are the changes needed? Remove `# doctest: +SKIP` of

[GitHub] [spark] wangyum commented on pull request #37519: [SPARK-40050][SQL] Enhance `EliminateSorts` to support removing sorts via `LocalLimit`

2022-08-19 Thread GitBox
wangyum commented on PR #37519: URL: https://github.com/apache/spark/pull/37519#issuecomment-1220637606 Thank you all. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #37573: [SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads

2022-08-19 Thread GitBox
cloud-fan commented on PR #37573: URL: https://github.com/apache/spark/pull/37573#issuecomment-1220725627 This is developer API so I'm fine with this cleanup. Can you push an empty commit to retrigger the Github Action tests? -- This is an automated message from the Apache Git Service.

[GitHub] [spark] MaxGekk closed pull request #37452: [SPARK-40018][SQL][TESTS] Output `SparkThrowable` to SQL golden files in JSON format

2022-08-19 Thread GitBox
MaxGekk closed pull request #37452: [SPARK-40018][SQL][TESTS] Output `SparkThrowable` to SQL golden files in JSON format URL: https://github.com/apache/spark/pull/37452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] wangyum closed pull request #37562: [SPARK-40133][SQL][TESTS] Update excluded TPC-DS queries golden files

2022-08-19 Thread GitBox
wangyum closed pull request #37562: [SPARK-40133][SQL][TESTS] Update excluded TPC-DS queries golden files URL: https://github.com/apache/spark/pull/37562 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #37573: [SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads

2022-08-19 Thread GitBox
ryan-johnson-databricks commented on code in PR #37573: URL: https://github.com/apache/spark/pull/37573#discussion_r950160112 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala: ## @@ -115,11 +115,7 @@ private[sql] class AvroFileFormat extends

[GitHub] [spark] Yikun opened a new pull request, #37583: [SPARK-38961][PS] Raise ImportError if pandas version mismatch when creating PS document "Supported APIs"

2022-08-19 Thread GitBox
Yikun opened a new pull request, #37583: URL: https://github.com/apache/spark/pull/37583 ### What changes were proposed in this pull request? Raise ImportError when creating PS document "Supported APIs" but pandas version mismatch ### Why are the changes needed? The

[GitHub] [spark] Yikun commented on pull request #37579: [SPARK-40145][INFRA] Create infra image when cutting down branches

2022-08-19 Thread GitBox
Yikun commented on PR #37579: URL: https://github.com/apache/spark/pull/37579#issuecomment-1220612171 @HyukjinKwon Thanks! Ready to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum commented on pull request #37562: [SPARK-40133][SQL][TESTS] Update excluded TPC-DS queries golden files

2022-08-19 Thread GitBox
wangyum commented on PR #37562: URL: https://github.com/apache/spark/pull/37562#issuecomment-1220633046 Thank you all. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Yikun opened a new pull request, #37581: [SPARK-40142][PYTHON][SQL][FOLLOWUP] Fix version of asin/mean and add alias note for mean

2022-08-19 Thread GitBox
Yikun opened a new pull request, #37581: URL: https://github.com/apache/spark/pull/37581 ### What changes were proposed in this pull request? Fix version of asin/mean and add alias note for mean ### Why are the changes needed? According to:

[GitHub] [spark] Yikun commented on a diff in pull request #37575: [SPARK-40142][PYTHON][SQL] Make pyspark.sql.functions examples self-contained (part 1, 25 functions)

2022-08-19 Thread GitBox
Yikun commented on code in PR #37575: URL: https://github.com/apache/spark/pull/37575#discussion_r950133007 ## python/pyspark/sql/functions.py: ## @@ -307,34 +485,120 @@ def min_by(col: "ColumnOrName", ord: "ColumnOrName") -> Column: return

[GitHub] [spark] Yikun commented on pull request #37583: [SPARK-38961][PS] Raise ImportError if pandas version mismatch when creating PS document "Supported APIs"

2022-08-19 Thread GitBox
Yikun commented on PR #37583: URL: https://github.com/apache/spark/pull/37583#issuecomment-1220720125 I remember what we need is pandas strong matching otherwise raise the import error, right? Mind to take a look? @beobest2 @HyukjinKwon -- This is an automated message from

[GitHub] [spark] lvshaokang commented on pull request #36336: [SPARK-38692][SQL] Use error classes in the compilation errors of function args

2022-08-19 Thread GitBox
lvshaokang commented on PR #36336: URL: https://github.com/apache/spark/pull/36336#issuecomment-1220872836 @MaxGekk Sry.I recently had time to complete this issue.Please take a look this pr.I unifyed the error class message and moved the `INVALID_PARAMETER_VALUE` to

[GitHub] [spark] wangyum closed pull request #37519: [SPARK-40050][SQL] Enhance `EliminateSorts` to support removing sorts via `LocalLimit`

2022-08-19 Thread GitBox
wangyum closed pull request #37519: [SPARK-40050][SQL] Enhance `EliminateSorts` to support removing sorts via `LocalLimit` URL: https://github.com/apache/spark/pull/37519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] steveloughran commented on a diff in pull request #37468: [SPARK-40034][SQL] PathOutputCommitters to support dynamic partitions

2022-08-19 Thread GitBox
steveloughran commented on code in PR #37468: URL: https://github.com/apache/spark/pull/37468#discussion_r950165502 ## hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala: ## @@ -51,14 +51,7 @@ class PathOutputCommitProtocol(

[GitHub] [spark] ryan-johnson-databricks commented on pull request #37573: [SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads

2022-08-19 Thread GitBox
ryan-johnson-databricks commented on PR #37573: URL: https://github.com/apache/spark/pull/37573#issuecomment-1220705114 > Hi, @ryan-johnson-databricks . Apache Spark uses the PR contributor's GitHub Action resources instead of Apache Spark GitHub Action resources. Please enable GitHub

[GitHub] [spark] MaxGekk commented on pull request #37452: [SPARK-40018][SQL][TESTS] Output `SparkThrowable` to SQL golden files in JSON format

2022-08-19 Thread GitBox
MaxGekk commented on PR #37452: URL: https://github.com/apache/spark/pull/37452#issuecomment-1220791289 Merging to master. Thank you, @srielau @cloud-fan @entong @HyukjinKwon for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] Yikun commented on pull request #37585: [SPARK-39310][PS] Change `requires_same_anchor` to `check_same_anchor`

2022-08-19 Thread GitBox
Yikun commented on PR #37585: URL: https://github.com/apache/spark/pull/37585#issuecomment-1220814630 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] Yikun opened a new pull request, #37585: [SPARK-39310][PS] Change `requires_same_anchor` to `check_same_anchor`

2022-08-19 Thread GitBox
Yikun opened a new pull request, #37585: URL: https://github.com/apache/spark/pull/37585 ### What changes were proposed in this pull request? Change `requires_same_anchor` to `check_same_anchor` for `_update_internal_frame` func ### Why are the changes needed? There were some

[GitHub] [spark] lvshaokang commented on a diff in pull request #36336: [SPARK-38692][SQL] Use error classes in the compilation errors of function args

2022-08-19 Thread GitBox
lvshaokang commented on code in PR #36336: URL: https://github.com/apache/spark/pull/36336#discussion_r949303635 ## core/src/main/resources/error/error-classes.json: ## @@ -119,6 +119,33 @@ "message" : [ "The fraction of sec must be zero. Valid range is [0, 60]. If

[GitHub] [spark] JoshRosen commented on a diff in pull request #37573: [SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads

2022-08-19 Thread GitBox
JoshRosen commented on code in PR #37573: URL: https://github.com/apache/spark/pull/37573#discussion_r950405113 ## core/src/main/scala/org/apache/spark/TaskContext.scala: ## @@ -113,48 +115,21 @@ abstract class TaskContext extends Serializable { * * Exceptions thrown by

[GitHub] [spark] revans2 commented on pull request #37540: [SPARK-40089][SQL] Fix sorting for some Decimal types

2022-08-19 Thread GitBox
revans2 commented on PR #37540: URL: https://github.com/apache/spark/pull/37540#issuecomment-1220971184 @cloud-fan could you please take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun closed pull request #37504: [SPARK-40065][K8S] Mount ConfigMap on executors with non-default profile as well

2022-08-19 Thread GitBox
dongjoon-hyun closed pull request #37504: [SPARK-40065][K8S] Mount ConfigMap on executors with non-default profile as well URL: https://github.com/apache/spark/pull/37504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] pan3793 commented on a diff in pull request #36995: [SPARK-39607][SQL][DSV2] Distribution and ordering support V2 function in writing

2022-08-19 Thread GitBox
pan3793 commented on code in PR #36995: URL: https://github.com/apache/spark/pull/36995#discussion_r950634605 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala: ## @@ -17,22 +17,33 @@ package

[GitHub] [spark] dongjoon-hyun commented on pull request #35663: [SPARK-37809][K8S] Add `YuniKorn` Feature Step

2022-08-19 Thread GitBox
dongjoon-hyun commented on PR #35663: URL: https://github.com/apache/spark/pull/35663#issuecomment-1220961683 +1 for your plan, @yangwwei . - Apache Spark community maintains `Feature Release Branches like branch-3.3` with bug fix releases for a period of 18 months. -

[GitHub] [spark] vitaliili-db commented on a diff in pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-19 Thread GitBox
vitaliili-db commented on code in PR #37483: URL: https://github.com/apache/spark/pull/37483#discussion_r950463020 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -169,6 +169,39 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] vitaliili-db commented on a diff in pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-19 Thread GitBox
vitaliili-db commented on code in PR #37483: URL: https://github.com/apache/spark/pull/37483#discussion_r950463225 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2469,14 +2469,65 @@ case class Encode(value: Expression,

[GitHub] [spark] dongjoon-hyun closed pull request #37499: [SPARK-40060][CORE] Add `numberDecommissioningExecutors` metric

2022-08-19 Thread GitBox
dongjoon-hyun closed pull request #37499: [SPARK-40060][CORE] Add `numberDecommissioningExecutors` metric URL: https://github.com/apache/spark/pull/37499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37203: [SPARK-39755][K8S] Randomization in Spark local directory for K8 resource managers

2022-08-19 Thread GitBox
dongjoon-hyun commented on code in PR #37203: URL: https://github.com/apache/spark/pull/37203#discussion_r950519087 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala: ## @@ -24,6 +24,7 @@ import

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37203: [SPARK-39755][K8S] Randomization in Spark local directory for K8 resource managers

2022-08-19 Thread GitBox
dongjoon-hyun commented on code in PR #37203: URL: https://github.com/apache/spark/pull/37203#discussion_r950519708 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala: ## @@ -35,9 +36,10 @@ private[spark] class

[GitHub] [spark] aokolnychyi commented on a diff in pull request #36995: [SPARK-39607][SQL][DSV2] Distribution and ordering support V2 function in writing

2022-08-19 Thread GitBox
aokolnychyi commented on code in PR #36995: URL: https://github.com/apache/spark/pull/36995#discussion_r950515089 ## sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala: ## @@ -48,6 +48,11 @@ case class DataSourceV2Relation(

[GitHub] [spark] beobest2 commented on pull request #37583: [SPARK-39170][PS] Raise ImportError if pandas version mismatch when creating PS document "Supported APIs"

2022-08-19 Thread GitBox
beobest2 commented on PR #37583: URL: https://github.com/apache/spark/pull/37583#issuecomment-1221076128 LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dcoliversun commented on pull request #35886: [SPARK-38582][K8S] Add `KubernetesUtils.buildEnvVars(WithFieldRef)?` utility functions

2022-08-19 Thread GitBox
dcoliversun commented on PR #35886: URL: https://github.com/apache/spark/pull/35886#issuecomment-1221092067 Thanks for your help @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] attilapiros commented on pull request #37474: [SPARK-40039][SS] Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface

2022-08-19 Thread GitBox
attilapiros commented on PR #37474: URL: https://github.com/apache/spark/pull/37474#issuecomment-1221197541 @HeartSaVioR sure, please find the "Performance test" at the description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] yangwwei commented on pull request #35663: [SPARK-37809][K8S] Add `YuniKorn` Feature Step

2022-08-19 Thread GitBox
yangwwei commented on PR #35663: URL: https://github.com/apache/spark/pull/35663#issuecomment-1220913772 Hi @dongjoon-hyun , @Yikun Gotcha. I think we can do the following: 1. I will remove the code changes in this PR. Maybe I will just close this one and create a new PR. That PR

[GitHub] [spark] HyukjinKwon commented on pull request #37583: [SPARK-39170][PS] Raise ImportError if pandas version mismatch when creating PS document "Supported APIs"

2022-08-19 Thread GitBox
HyukjinKwon commented on PR #37583: URL: https://github.com/apache/spark/pull/37583#issuecomment-1221190871 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #37579: [SPARK-40145][INFRA] Create infra image when cutting down branches

2022-08-19 Thread GitBox
HyukjinKwon commented on PR #37579: URL: https://github.com/apache/spark/pull/37579#issuecomment-1221191066 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #37579: [SPARK-40145][INFRA] Create infra image when cutting down branches

2022-08-19 Thread GitBox
HyukjinKwon closed pull request #37579: [SPARK-40145][INFRA] Create infra image when cutting down branches URL: https://github.com/apache/spark/pull/37579 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon closed pull request #37583: [SPARK-39170][PS] Raise ImportError if pandas version mismatch when creating PS document "Supported APIs"

2022-08-19 Thread GitBox
HyukjinKwon closed pull request #37583: [SPARK-39170][PS] Raise ImportError if pandas version mismatch when creating PS document "Supported APIs" URL: https://github.com/apache/spark/pull/37583 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on pull request #37581: [SPARK-40142][PYTHON][SQL][FOLLOWUP] Fix version of asin/mean and add alias note for mean

2022-08-19 Thread GitBox
HyukjinKwon commented on PR #37581: URL: https://github.com/apache/spark/pull/37581#issuecomment-122119 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37203: [SPARK-39755][K8S] Randomization in Spark local directory for K8 resource managers

2022-08-19 Thread GitBox
dongjoon-hyun commented on code in PR #37203: URL: https://github.com/apache/spark/pull/37203#discussion_r950521862 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala: ## @@ -35,9 +36,10 @@ private[spark] class

[GitHub] [spark] dongjoon-hyun commented on pull request #35663: [SPARK-37809][K8S] Add `YuniKorn` Feature Step

2022-08-19 Thread GitBox
dongjoon-hyun commented on PR #35663: URL: https://github.com/apache/spark/pull/35663#issuecomment-1220963645 BTW, cc @sunchao since he is interested in the release manager for Apache Spark 3.3.1. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #37584: [SPARK-39150][PS] Enable doctest which was disabled when pandas 1.4 upgrade

2022-08-19 Thread GitBox
HyukjinKwon commented on PR #37584: URL: https://github.com/apache/spark/pull/37584#issuecomment-1221191560 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #37581: [SPARK-40142][PYTHON][SQL][FOLLOWUP] Fix version of asin/mean and add alias note for mean

2022-08-19 Thread GitBox
HyukjinKwon closed pull request #37581: [SPARK-40142][PYTHON][SQL][FOLLOWUP] Fix version of asin/mean and add alias note for mean URL: https://github.com/apache/spark/pull/37581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon closed pull request #37584: [SPARK-39150][PS] Enable doctest which was disabled when pandas 1.4 upgrade

2022-08-19 Thread GitBox
HyukjinKwon closed pull request #37584: [SPARK-39150][PS] Enable doctest which was disabled when pandas 1.4 upgrade URL: https://github.com/apache/spark/pull/37584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk commented on a diff in pull request #28499: [SPARK-31678][SQL] Print error stack trace for Spark SQL CLI when error occurs

2022-08-19 Thread GitBox
MaxGekk commented on code in PR #28499: URL: https://github.com/apache/spark/pull/28499#discussion_r950513186 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala: ## @@ -380,10 +380,18 @@ private[hive] class SparkSQLCLIDriver

[GitHub] [spark] dongjoon-hyun commented on pull request #37504: [SPARK-40065][K8S] Mount ConfigMap on executors with non-default profile as well

2022-08-19 Thread GitBox
dongjoon-hyun commented on PR #37504: URL: https://github.com/apache/spark/pull/37504#issuecomment-1221038168 Merged to master/3.3/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #37504: [SPARK-40065][K8S] Mount ConfigMap on executors with non-default profile as well

2022-08-19 Thread GitBox
dongjoon-hyun commented on PR #37504: URL: https://github.com/apache/spark/pull/37504#issuecomment-1221039466 Welcome to the Apache Spark community, @nsuke . I added you to the Apache Spark contributor group and assigned SPARK-40065 to you. -- This is an automated message from the

[GitHub] [spark] sunchao commented on pull request #37419: [SPARK-39833][SQL] Disable Parquet column index in DSv1 to fix a correctness issue in the case of overlapping partition and data columns

2022-08-19 Thread GitBox
sunchao commented on PR #37419: URL: https://github.com/apache/spark/pull/37419#issuecomment-1221083165 @sadikovi yes, I can also take a look at this next week. I'm fine either way: what do you think @cloud-fan @HyukjinKwon , should we merge this PR as it is (via disabling column index)

[GitHub] [spark] nsuke commented on pull request #37504: [SPARK-40065][K8S] Mount ConfigMap on executors with non-default profile as well

2022-08-19 Thread GitBox
nsuke commented on PR #37504: URL: https://github.com/apache/spark/pull/37504#issuecomment-1221200166 thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #35886: [SPARK-38582][K8S] Add `KubernetesUtils.buildEnvVars(WithFieldRef)?` utility functions

2022-08-19 Thread GitBox
dongjoon-hyun commented on PR #35886: URL: https://github.com/apache/spark/pull/35886#issuecomment-1220952004 Also, thank you, @Yikun and @martin-g , too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk commented on a diff in pull request #28499: [SPARK-31678][SQL] Print error stack trace for Spark SQL CLI when error occurs

2022-08-19 Thread GitBox
MaxGekk commented on code in PR #28499: URL: https://github.com/apache/spark/pull/28499#discussion_r950513186 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala: ## @@ -380,10 +380,18 @@ private[hive] class SparkSQLCLIDriver

[GitHub] [spark] dongjoon-hyun closed pull request #35886: [SPARK-38582][K8S] Add `KubernetesUtils.buildEnvVars(WithFieldRef)?` utility functions

2022-08-19 Thread GitBox
dongjoon-hyun closed pull request #35886: [SPARK-38582][K8S] Add `KubernetesUtils.buildEnvVars(WithFieldRef)?` utility functions URL: https://github.com/apache/spark/pull/35886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] allisonwang-db opened a new pull request, #37586: [SPARK-40153][SQL] Unify the logic of resolve functions and table-valued functions

2022-08-19 Thread GitBox
allisonwang-db opened a new pull request, #37586: URL: https://github.com/apache/spark/pull/37586 ### What changes were proposed in this pull request? This PR refactors the analyzer rule `ResolveTableValuedFunctions` to first look for built-in or temp table functions, and

[GitHub] [spark] physinet commented on pull request #37329: [SPARK-39832][PYTHON] Support column arguments in regexp_replace

2022-08-19 Thread GitBox
physinet commented on PR #37329: URL: https://github.com/apache/spark/pull/37329#issuecomment-1221136223 All comments are addressed; this should be ready for merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon closed pull request #37585: [SPARK-39310][PS] Change `requires_same_anchor` to `check_same_anchor`

2022-08-19 Thread GitBox
HyukjinKwon closed pull request #37585: [SPARK-39310][PS] Change `requires_same_anchor` to `check_same_anchor` URL: https://github.com/apache/spark/pull/37585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #37585: [SPARK-39310][PS] Change `requires_same_anchor` to `check_same_anchor`

2022-08-19 Thread GitBox
HyukjinKwon commented on PR #37585: URL: https://github.com/apache/spark/pull/37585#issuecomment-1221191787 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] vitaliili-db commented on pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-19 Thread GitBox
vitaliili-db commented on PR #37483: URL: https://github.com/apache/spark/pull/37483#issuecomment-1221215906 @Yikun Hi Yikun, for some reason `Base image build` failed several times until I added `write` permission to GITHUB_TOKEN, is it expected? -- This is an automated message from the

[GitHub] [spark] HyukjinKwon commented on pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

2022-08-19 Thread GitBox
HyukjinKwon commented on PR #37582: URL: https://github.com/apache/spark/pull/37582#issuecomment-1221232624 cc @Yikun @itholic @viirya @xinrong-meng @ueshin @zhengruifeng PTAL when you find some time. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] viirya commented on a diff in pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

2022-08-19 Thread GitBox
viirya commented on code in PR #37582: URL: https://github.com/apache/spark/pull/37582#discussion_r950650869 ## python/pyspark/sql/session.py: ## @@ -99,8 +99,15 @@ def toDF(self, schema=None, sampleRatio=None): Examples ->>>

[GitHub] [spark] Yikun commented on pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-19 Thread GitBox
Yikun commented on PR #37483: URL: https://github.com/apache/spark/pull/37483#issuecomment-1221229222 @vitaliili-db **By default**, write permission already included to your `GITHUB_TOKEN`, but if you set it manually (see also [1], [2]), it will failed, current CI will first build infra

[GitHub] [spark] viirya commented on a diff in pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

2022-08-19 Thread GitBox
viirya commented on code in PR #37582: URL: https://github.com/apache/spark/pull/37582#discussion_r950651293 ## python/pyspark/sql/session.py: ## @@ -1185,6 +1363,22 @@ def readStream(self) -> DataStreamReader: Returns ---

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37203: [SPARK-39755][K8S] Randomization in Spark local directory for K8 resource managers

2022-08-19 Thread GitBox
dongjoon-hyun commented on code in PR #37203: URL: https://github.com/apache/spark/pull/37203#discussion_r950653293 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala: ## @@ -33,12 +34,11 @@ private[spark] class

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37203: [SPARK-39755][K8S] Randomization in Spark local directory for K8 resource managers

2022-08-19 Thread GitBox
dongjoon-hyun commented on code in PR #37203: URL: https://github.com/apache/spark/pull/37203#discussion_r950653318 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala: ## @@ -68,9 +69,8 @@ private[spark] class

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37203: [SPARK-39755][K8S] Randomization in Spark local directory for K8 resource managers

2022-08-19 Thread GitBox
dongjoon-hyun commented on code in PR #37203: URL: https://github.com/apache/spark/pull/37203#discussion_r950653274 ## core/src/test/scala/org/apache/spark/storage/LocalDirsSuite.scala: ## @@ -85,4 +85,5 @@ class LocalDirsSuite extends SparkFunSuite with LocalRootDirsTest {

[GitHub] [spark] viirya commented on pull request #37582: [SPARK-40147][PYTHON][SQL] Make pyspark.sql.session examples self-contained

2022-08-19 Thread GitBox
viirya commented on PR #37582: URL: https://github.com/apache/spark/pull/37582#issuecomment-1221235536 Two minor comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan closed pull request #37539: [SPARK-40107][SQL] Pull out empty2null conversion from FileFormatWriter

2022-08-19 Thread GitBox
cloud-fan closed pull request #37539: [SPARK-40107][SQL] Pull out empty2null conversion from FileFormatWriter URL: https://github.com/apache/spark/pull/37539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] yangwwei commented on pull request #35663: [SPARK-37809][K8S] Add `YuniKorn` Feature Step

2022-08-19 Thread GitBox
yangwwei commented on PR #35663: URL: https://github.com/apache/spark/pull/35663#issuecomment-1220293839 Hi @dongjoon-hyun Wait a sec. I think I missed one point. Does this mean we can simply set {{APP_ID}}, and this {{APP_ID}} will be substituted to the actual job ID

[GitHub] [spark] Yikun commented on pull request #35663: [SPARK-37809][K8S] Add `YuniKorn` Feature Step

2022-08-19 Thread GitBox
Yikun commented on PR #35663: URL: https://github.com/apache/spark/pull/35663#issuecomment-1220302142 @yangwwei IIRC, we already had the discussion before [1]. If all info like `queue`, `miniRes` can be passed by annotation/label well, I think current Spark already meet the requirement of

[GitHub] [spark] cloud-fan commented on pull request #37539: [SPARK-40107][SQL] Pull out empty2null conversion from FileFormatWriter

2022-08-19 Thread GitBox
cloud-fan commented on PR #37539: URL: https://github.com/apache/spark/pull/37539#issuecomment-1220309697 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] ConeyLiu commented on a diff in pull request #37565: [SPARK-40137][SQL] Combines limits after projection

2022-08-19 Thread GitBox
ConeyLiu commented on code in PR #37565: URL: https://github.com/apache/spark/pull/37565#discussion_r949857547 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1994,6 +1994,13 @@ object EliminateLimits extends Rule[LogicalPlan] {

[GitHub] [spark] Yikun opened a new pull request, #37579: [SPARK-40145][INFRA] Create infra image when cutting down branches

2022-08-19 Thread GitBox
Yikun opened a new pull request, #37579: URL: https://github.com/apache/spark/pull/37579 ### What changes were proposed in this pull request? This PR make jobs running when creating branches/tags and merging commits in branches. This job will create cache/static image like:

[GitHub] [spark] AmplabJenkins commented on pull request #37573: [SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads

2022-08-19 Thread GitBox
AmplabJenkins commented on PR #37573: URL: https://github.com/apache/spark/pull/37573#issuecomment-1220343453 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #37571: [SPARK-21487][CORE][WEB UI] Change extension of mustache template files to .mustache

2022-08-19 Thread GitBox
AmplabJenkins commented on PR #37571: URL: https://github.com/apache/spark/pull/37571#issuecomment-1220343494 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #37574: [MINOR][SQL] Specify the column name when the data type is not supported by datasource

2022-08-19 Thread GitBox
AmplabJenkins commented on PR #37574: URL: https://github.com/apache/spark/pull/37574#issuecomment-1220343416 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] EnricoMi commented on a diff in pull request #37211: [SPARK-39644][SQL] Add RangePartitioning reporting for V2 DataSources

2022-08-19 Thread GitBox
EnricoMi commented on code in PR #37211: URL: https://github.com/apache/spark/pull/37211#discussion_r949983359 ## sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala: ## @@ -119,13 +119,16 @@ case class DataSourceV2Relation( *

[GitHub] [spark] AmplabJenkins commented on pull request #37565: [SPARK-40137][SQL] Combines limits after projection

2022-08-19 Thread GitBox
AmplabJenkins commented on PR #37565: URL: https://github.com/apache/spark/pull/37565#issuecomment-1220442182 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] steveloughran commented on a diff in pull request #37468: [SPARK-40034][SQL] PathOutputCommitters to support dynamic partitions

2022-08-19 Thread GitBox
steveloughran commented on code in PR #37468: URL: https://github.com/apache/spark/pull/37468#discussion_r949986859 ## hadoop-cloud/src/hadoop-3/main/scala/org/apache/spark/internal/io/cloud/PathOutputCommitProtocol.scala: ## @@ -140,6 +162,28 @@ class PathOutputCommitProtocol(

[GitHub] [spark] dcoliversun commented on a diff in pull request #37521: [SPARK-40078][PYTHON][DOCS] Make pyspark.sql.column examples self-contained

2022-08-19 Thread GitBox
dcoliversun commented on code in PR #37521: URL: https://github.com/apache/spark/pull/37521#discussion_r949995206 ## python/pyspark/sql/column.py: ## @@ -430,6 +454,19 @@ def getField(self, name: Any) -> "Column": .. versionadded:: 1.3.0 +Parameters +

[GitHub] [spark] dcoliversun commented on a diff in pull request #37521: [SPARK-40078][PYTHON][DOCS] Make pyspark.sql.column examples self-contained

2022-08-19 Thread GitBox
dcoliversun commented on code in PR #37521: URL: https://github.com/apache/spark/pull/37521#discussion_r94640 ## python/pyspark/sql/column.py: ## @@ -405,6 +415,20 @@ def getItem(self, key: Any) -> "Column": .. versionadded:: 1.3.0 +Parameters +

[GitHub] [spark] dcoliversun commented on a diff in pull request #37521: [SPARK-40078][PYTHON][DOCS] Make pyspark.sql.column examples self-contained

2022-08-19 Thread GitBox
dcoliversun commented on code in PR #37521: URL: https://github.com/apache/spark/pull/37521#discussion_r95562 ## python/pyspark/sql/column.py: ## @@ -462,6 +499,18 @@ def withField(self, fieldName: str, col: "Column") -> "Column": .. versionadded:: 3.1.0 +

[GitHub] [spark] dcoliversun commented on a diff in pull request #37521: [SPARK-40078][PYTHON][DOCS] Make pyspark.sql.column examples self-contained

2022-08-19 Thread GitBox
dcoliversun commented on code in PR #37521: URL: https://github.com/apache/spark/pull/37521#discussion_r95222 ## python/pyspark/sql/column.py: ## @@ -495,6 +544,17 @@ def dropFields(self, *fieldNames: str) -> "Column": .. versionadded:: 3.1.0 +

[GitHub] [spark] dcoliversun commented on a diff in pull request #37521: [SPARK-40078][PYTHON][DOCS] Make pyspark.sql.column examples self-contained

2022-08-19 Thread GitBox
dcoliversun commented on code in PR #37521: URL: https://github.com/apache/spark/pull/37521#discussion_r950002327 ## python/pyspark/sql/column.py: ## @@ -720,8 +817,20 @@ def isin(self, *cols: Any) -> "Column": .. versionadded:: 1.5.0 +Parameters +

[GitHub] [spark] dongjoon-hyun commented on pull request #35663: [SPARK-37809][K8S] Add `YuniKorn` Feature Step

2022-08-19 Thread GitBox
dongjoon-hyun commented on PR #35663: URL: https://github.com/apache/spark/pull/35663#issuecomment-1220474012 1. Yes, that is a place holder, @yangwwei . We can skip the code unless we need additional function with that class. 2. Yes, of course, we need a doc. However, Apache Spark needs

[GitHub] [spark] zhengruifeng commented on pull request #37569: [SPARK-40138][PS][SQL] Implement DataFrame.mode

2022-08-19 Thread GitBox
zhengruifeng commented on PR #37569: URL: https://github.com/apache/spark/pull/37569#issuecomment-1220478174 @HyukjinKwon thanks for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] itholic commented on a diff in pull request #37575: [SPARK-40142][PYTHON][SQL] Make pyspark.sql.functions examples self-contained (part 1, 25 functions)

2022-08-19 Thread GitBox
itholic commented on code in PR #37575: URL: https://github.com/apache/spark/pull/37575#discussion_r949905055 ## python/pyspark/sql/functions.py: ## @@ -155,34 +182,143 @@ def col(col: str) -> Column: column = col -@since(1.3) def asc(col: "ColumnOrName") -> Column:

[GitHub] [spark] gengliangwang opened a new pull request, #37580: [SPARK-40146][SQL] Simply the codegen of getting map value

2022-08-19 Thread GitBox
gengliangwang opened a new pull request, #37580: URL: https://github.com/apache/spark/pull/37580 ### What changes were proposed in this pull request? Simply the code generation of `GetMapValueUtil` ### Why are the changes needed? Code generation improvement.

[GitHub] [spark] HyukjinKwon commented on pull request #37575: [SPARK-40142][PYTHON][SQL] Make pyspark.sql.functions examples self-contained (part 1, 25 functions)

2022-08-19 Thread GitBox
HyukjinKwon commented on PR #37575: URL: https://github.com/apache/spark/pull/37575#issuecomment-1220381009 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Add `KubernetesUtils.buildEnvVars(WithFieldRef)?` utility functions

2022-08-19 Thread GitBox
dcoliversun commented on code in PR #35886: URL: https://github.com/apache/spark/pull/35886#discussion_r949966316 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala: ## @@ -381,4 +381,34 @@ object KubernetesUtils extends

[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Add `KubernetesUtils.buildEnvVars(WithFieldRef)?` utility functions

2022-08-19 Thread GitBox
dcoliversun commented on code in PR #35886: URL: https://github.com/apache/spark/pull/35886#discussion_r949972477 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala: ## @@ -381,4 +381,34 @@ object KubernetesUtils extends

[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Add `KubernetesUtils.buildEnvVars(WithFieldRef)?` utility functions

2022-08-19 Thread GitBox
dcoliversun commented on code in PR #35886: URL: https://github.com/apache/spark/pull/35886#discussion_r949973283 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala: ## @@ -122,68 +122,41 @@ private[spark]

[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Add `KubernetesUtils.buildEnvVars(WithFieldRef)?` utility functions

2022-08-19 Thread GitBox
dcoliversun commented on code in PR #35886: URL: https://github.com/apache/spark/pull/35886#discussion_r949974064 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesUtilsSuite.scala: ## @@ -127,4 +127,35 @@ class KubernetesUtilsSuite

[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Add `KubernetesUtils.buildEnvVars(WithFieldRef)?` utility functions

2022-08-19 Thread GitBox
dcoliversun commented on code in PR #35886: URL: https://github.com/apache/spark/pull/35886#discussion_r949975155 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala: ## @@ -122,68 +122,41 @@ private[spark]

[GitHub] [spark] MaxGekk commented on pull request #37520: [SPARK-40098][SQL] Format error messages in the Thrift Server

2022-08-19 Thread GitBox
MaxGekk commented on PR #37520: URL: https://github.com/apache/spark/pull/37520#issuecomment-1220433022 ImageFileFormatSuite failed again. I re-ran it locally. Seems like it is just a flaky test. Merging to master. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] MaxGekk closed pull request #37520: [SPARK-40098][SQL] Format error messages in the Thrift Server

2022-08-19 Thread GitBox
MaxGekk closed pull request #37520: [SPARK-40098][SQL] Format error messages in the Thrift Server URL: https://github.com/apache/spark/pull/37520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #37575: [SPARK-40142][PYTHON][SQL] Make pyspark.sql.functions examples self-contained (part 1, 25 functions)

2022-08-19 Thread GitBox
HyukjinKwon commented on PR #37575: URL: https://github.com/apache/spark/pull/37575#issuecomment-1220380851 Thanks @itholic ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #37575: [SPARK-40142][PYTHON][SQL] Make pyspark.sql.functions examples self-contained (part 1, 25 functions)

2022-08-19 Thread GitBox
HyukjinKwon closed pull request #37575: [SPARK-40142][PYTHON][SQL] Make pyspark.sql.functions examples self-contained (part 1, 25 functions) URL: https://github.com/apache/spark/pull/37575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

  1   2   >