[GitHub] [spark] zhengruifeng opened a new pull request, #37874: [SPARK-40421][PS] Make `spearman` correlation in `DataFrame.corr` support missing values and `min_periods`

2022-09-13 Thread GitBox
zhengruifeng opened a new pull request, #37874: URL: https://github.com/apache/spark/pull/37874 ### What changes were proposed in this pull request? refactor `spearman` correlation in `DataFrame.corr` to: 1. support missing values; 2. add parameter min_periods; 3. enable

[GitHub] [spark] mridulm commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-13 Thread GitBox
mridulm commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r970331132 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -62,8 +61,7 @@ public String build(Expression expr) { String

[GitHub] [spark] mridulm commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-13 Thread GitBox
mridulm commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r970331132 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -62,8 +61,7 @@ public String build(Expression expr) { String

[GitHub] [spark] srielau commented on pull request #37811: [SPARK-40360] *_ALREADY_EXISTS and *_NOT_FOUND error

2022-09-13 Thread GitBox
srielau commented on PR #37811: URL: https://github.com/apache/spark/pull/37811#issuecomment-1246251529 OK, let’s see On Sep 13, 2022, 10:11 PM -0700, Gengliang Wang ***@***.***>, wrote: Hi @srielau , I have just merged the bug fix

[GitHub] [spark] HeartSaVioR closed pull request #37864: [SPARK-40414][SQL][PYTHON] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
HeartSaVioR closed pull request #37864: [SPARK-40414][SQL][PYTHON] More generic type on PythonArrowInput and PythonArrowOutput URL: https://github.com/apache/spark/pull/37864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HeartSaVioR commented on pull request #37864: [SPARK-40414][SQL][PYTHON] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
HeartSaVioR commented on PR #37864: URL: https://github.com/apache/spark/pull/37864#issuecomment-1246245394 Thanks for reviewing! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] gengliangwang commented on pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-13 Thread GitBox
gengliangwang commented on PR #37840: URL: https://github.com/apache/spark/pull/37840#issuecomment-1246243282 Hi @dtenedor, I have just merged the bug fix https://github.com/apache/spark/pull/37861. The query context should be set correctly in the AnlysisException. I believe the test

[GitHub] [spark] gengliangwang commented on pull request #37811: [SPARK-40360] *_ALREADY_EXISTS and *_NOT_FOUND error

2022-09-13 Thread GitBox
gengliangwang commented on PR #37811: URL: https://github.com/apache/spark/pull/37811#issuecomment-1246242092 Hi @srielau , I have just merged the bug fix https://github.com/apache/spark/pull/37861. The query context should be set correctly in the AnlysisException. Please rebase to the

[GitHub] [spark] gengliangwang closed pull request #37861: [SPARK-40324][SQL][FOLLOWUP] Fix a bug in setting query context in Analyzer

2022-09-13 Thread GitBox
gengliangwang closed pull request #37861: [SPARK-40324][SQL][FOLLOWUP] Fix a bug in setting query context in Analyzer URL: https://github.com/apache/spark/pull/37861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gengliangwang commented on pull request #37861: [SPARK-40324][SQL][FOLLOWUP] Fix a bug in setting query context in Analyzer

2022-09-13 Thread GitBox
gengliangwang commented on PR #37861: URL: https://github.com/apache/spark/pull/37861#issuecomment-1246239530 @cloud-fan @MaxGekk Thanks for the review. Merging this one to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on a diff in pull request #37853: [SPARK-40404][DOCS] Fix the wrong description related to `spark.shuffle.service.db.enabled` in the document

2022-09-13 Thread GitBox
LuciferYang commented on code in PR #37853: URL: https://github.com/apache/spark/pull/37853#discussion_r970309147 ## docs/spark-standalone.md: ## @@ -322,9 +322,9 @@ SPARK_WORKER_OPTS supports the following system properties: true Store External Shuffle service

[GitHub] [spark] itholic opened a new pull request, #37873: [SPARK-40419][SQL][TESTS] Integrate Grouped Aggregate Pandas UDFs into *.sql test cases

2022-09-13 Thread GitBox
itholic opened a new pull request, #37873: URL: https://github.com/apache/spark/pull/37873 ### What changes were proposed in this pull request? This PR proposes to integrate Grouped Aggregate Pandas UDF tests into *.sql test cases. ### Why are the changes needed? To

[GitHub] [spark] itholic commented on a diff in pull request #37836: [SPARK-40339][SPARK-40342][SPARK-40345][SPARK-40348][PS] Implement quantile in Rolling/RollingGroupby/Expanding/ExpandingGroupby

2022-09-13 Thread GitBox
itholic commented on code in PR #37836: URL: https://github.com/apache/spark/pull/37836#discussion_r970207129 ## python/pyspark/pandas/window.py: ## @@ -561,6 +573,101 @@ def mean(self) -> FrameLike: """ return super().mean() +def quantile(self,

[GitHub] [spark] LuciferYang commented on pull request #37868: [SPARK-40397][BUILD] Upgrade `org.scalatestplus:selenium` to 3.12.13

2022-09-13 Thread GitBox
LuciferYang commented on PR #37868: URL: https://github.com/apache/spark/pull/37868#issuecomment-1246179903 thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #37868: [SPARK-40397][BUILD] Upgrade `org.scalatestplus:selenium` to 3.12.13

2022-09-13 Thread GitBox
HyukjinKwon commented on PR #37868: URL: https://github.com/apache/spark/pull/37868#issuecomment-1246179393 cc @sarutak FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #37852: [SPARK-40403][SQL] Calculate unsafe array size using longs to avoid negative size in error message

2022-09-13 Thread GitBox
HyukjinKwon closed pull request #37852: [SPARK-40403][SQL] Calculate unsafe array size using longs to avoid negative size in error message URL: https://github.com/apache/spark/pull/37852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #37852: [SPARK-40403][SQL] Calculate unsafe array size using longs to avoid negative size in error message

2022-09-13 Thread GitBox
HyukjinKwon commented on PR #37852: URL: https://github.com/apache/spark/pull/37852#issuecomment-1246162583 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on a diff in pull request #37830: [SPARK-40387][SQL] Improve the implementation of Spark Decimal

2022-09-13 Thread GitBox
beliefer commented on code in PR #37830: URL: https://github.com/apache/spark/pull/37830#discussion_r970247708 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -240,9 +240,11 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] LuciferYang commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-13 Thread GitBox
LuciferYang commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r970247612 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expression.java: ## @@ -17,7 +17,7 @@ package

[GitHub] [spark] HeartSaVioR commented on pull request #37864: [SPARK-40414][SQL][PYTHON] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
HeartSaVioR commented on PR #37864: URL: https://github.com/apache/spark/pull/37864#issuecomment-1246141934 I'll merge this PR once build is green. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37864: [SPARK-40414][SQL][PYTHON] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
HeartSaVioR commented on code in PR #37864: URL: https://github.com/apache/spark/pull/37864#discussion_r970208295 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonArrowOutput.scala: ## @@ -33,12 +33,14 @@ import

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37864: [SPARK-40414][SQL][PYTHON] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
HeartSaVioR commented on code in PR #37864: URL: https://github.com/apache/spark/pull/37864#discussion_r970207233 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonArrowInput.scala: ## @@ -26,21 +26,30 @@ import org.apache.spark.{SparkEnv, TaskContext}

[GitHub] [spark] cloud-fan closed pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-09-13 Thread GitBox
cloud-fan closed pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE URL: https://github.com/apache/spark/pull/37612 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] AmplabJenkins commented on pull request #37870: [SPARK-33152] [SQL] New algorithm for ConstraintsPropagation rule to solve the problem of performance & OOM if the query plans have lar

2022-09-13 Thread GitBox
AmplabJenkins commented on PR #37870: URL: https://github.com/apache/spark/pull/37870#issuecomment-1246093493 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-09-13 Thread GitBox
cloud-fan commented on PR #37612: URL: https://github.com/apache/spark/pull/37612#issuecomment-1246091278 thanks, merging to master! (the last commit only add comments) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-13 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r970164323 ## python/pyspark/ml/functions.py: ## @@ -106,6 +111,167 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] srowen commented on a diff in pull request #37843: [SPARK-40398][CORE][SQL] Use Loop instead of Arrays.stream api

2022-09-13 Thread GitBox
srowen commented on code in PR #37843: URL: https://github.com/apache/spark/pull/37843#discussion_r970161558 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expression.java: ## @@ -17,7 +17,7 @@ package org.apache.spark.sql.connector.expressions;

[GitHub] [spark] Yikun commented on pull request #37828: [SPARK-40384][INFRA] Only do base image real in time build when infra dockerfile is changed

2022-09-13 Thread GitBox
Yikun commented on PR #37828: URL: https://github.com/apache/spark/pull/37828#issuecomment-1246018196 @HyukjinKwon Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #37828: [SPARK-40384][INFRA] Only do base image real in time build when infra dockerfile is changed

2022-09-13 Thread GitBox
HyukjinKwon commented on PR #37828: URL: https://github.com/apache/spark/pull/37828#issuecomment-1246014821 Offline discussed with @Yikun. Let's revert it out for now (since this isn't a problem if ghcr isn't broken, and to make our CI implementation simple) -- This is an automated

[GitHub] [spark] Yikun closed pull request #37865: [SPARK-40384][INFRA][FOLLOWUP] Also trigger PySpark and SparkR job when changing dockerfile

2022-09-13 Thread GitBox
Yikun closed pull request #37865: [SPARK-40384][INFRA][FOLLOWUP] Also trigger PySpark and SparkR job when changing dockerfile URL: https://github.com/apache/spark/pull/37865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dtenedor commented on a diff in pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-13 Thread GitBox
dtenedor commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r970102088 ## sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out: ## @@ -317,11 +317,18 @@ SELECT * FROM t1, LATERAL (SELECT c1 + c2 + rand(0) AS c3) struct<>

[GitHub] [spark] bersprockets commented on a diff in pull request #37852: [SPARK-40403][SQL] Calculate unsafe array size using longs to avoid negative size in error message

2022-09-13 Thread GitBox
bersprockets commented on code in PR #37852: URL: https://github.com/apache/spark/pull/37852#discussion_r970067289 ## sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java: ## @@ -55,10 +55,19 @@ public void initialize(int

[GitHub] [spark] gengliangwang commented on a diff in pull request #37861: [SPARK-40324][SQL][FOLLOWUP] Fix a bug in setting query context in Analyzer

2022-09-13 Thread GitBox
gengliangwang commented on code in PR #37861: URL: https://github.com/apache/spark/pull/37861#discussion_r970051718 ## sql/core/src/test/resources/sql-tests/results/group-by-filter.sql.out: ## @@ -233,14 +233,7 @@ org.apache.spark.sql.AnalysisException "sqlState" : "42000",

[GitHub] [spark] MaxGekk commented on a diff in pull request #37861: [SPARK-40324][SQL][FOLLOWUP] Fix a bug in setting query context in Analyzer

2022-09-13 Thread GitBox
MaxGekk commented on code in PR #37861: URL: https://github.com/apache/spark/pull/37861#discussion_r970049807 ## sql/core/src/test/resources/sql-tests/results/group-by-filter.sql.out: ## @@ -233,14 +233,7 @@ org.apache.spark.sql.AnalysisException "sqlState" : "42000",

[GitHub] [spark] dongjoon-hyun closed pull request #37872: [SPARK-40417][K8S][DOCS] Use YuniKorn v1.1+

2022-09-13 Thread GitBox
dongjoon-hyun closed pull request #37872: [SPARK-40417][K8S][DOCS] Use YuniKorn v1.1+ URL: https://github.com/apache/spark/pull/37872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #37872: [SPARK-40417][K8S][DOCS] Use YuniKorn v1.1+

2022-09-13 Thread GitBox
dongjoon-hyun commented on PR #37872: URL: https://github.com/apache/spark/pull/37872#issuecomment-1245886435 Thank you, @viirya . Merged to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] viirya commented on pull request #37872: [SPARK-40417][K8S][DOCS] Use YuniKorn v1.1+

2022-09-13 Thread GitBox
viirya commented on PR #37872: URL: https://github.com/apache/spark/pull/37872#issuecomment-1245882405 lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [spark] dongjoon-hyun closed pull request #37866: [SPARK-40362][SQL][3.3] Fix BinaryComparison canonicalization

2022-09-13 Thread GitBox
dongjoon-hyun closed pull request #37866: [SPARK-40362][SQL][3.3] Fix BinaryComparison canonicalization URL: https://github.com/apache/spark/pull/37866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #37872: [SPARK-40417][K8S][DOCS] Use YuniKorn v1.1+

2022-09-13 Thread GitBox
dongjoon-hyun commented on PR #37872: URL: https://github.com/apache/spark/pull/37872#issuecomment-1245854935 Could you review this, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ueshin commented on a diff in pull request #37864: [SPARK-40414][SQL][PYTHON] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
ueshin commented on code in PR #37864: URL: https://github.com/apache/spark/pull/37864#discussion_r969963457 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonArrowInput.scala: ## @@ -107,3 +107,27 @@ private[python] trait PythonArrowInput { self:

[GitHub] [spark] dongjoon-hyun opened a new pull request, #37872: [SPARK-40417][K8S][DOCS] Use YuniKorn v1.1+

2022-09-13 Thread GitBox
dongjoon-hyun opened a new pull request, #37872: URL: https://github.com/apache/spark/pull/37872 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] MaxGekk opened a new pull request, #37871: [WIP][SQL] Return a map from SparkThrowable.getMessageParameters

2022-09-13 Thread GitBox
MaxGekk opened a new pull request, #37871: URL: https://github.com/apache/spark/pull/37871 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] amaliujia commented on pull request #37710: [DRAFT] Spark Connect build as Driver Plugin with Shaded Dependencies

2022-09-13 Thread GitBox
amaliujia commented on PR #37710: URL: https://github.com/apache/spark/pull/37710#issuecomment-1245771985 My high level concern about this init PR is that it introduces a lot of code which is beyond what is required for a minimal valued product, and those code are not well tested.

[GitHub] [spark] allisonwang-db commented on a diff in pull request #37840: [SPARK-40416][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-13 Thread GitBox
allisonwang-db commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r969893279 ## sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out: ## @@ -317,11 +317,18 @@ SELECT * FROM t1, LATERAL (SELECT c1 + c2 + rand(0) AS c3)

[GitHub] [spark] mengxr commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-13 Thread GitBox
mengxr commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r969862754 ## python/pyspark/ml/functions.py: ## @@ -106,6 +111,167 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] dtenedor commented on a diff in pull request #37840: [SPARK-40394][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-13 Thread GitBox
dtenedor commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r969887731 ## sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out: ## @@ -317,11 +317,18 @@ SELECT * FROM t1, LATERAL (SELECT c1 + c2 + rand(0) AS c3) struct<>

[GitHub] [spark] dtenedor commented on a diff in pull request #37840: [SPARK-40394][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-13 Thread GitBox
dtenedor commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r969886892 ## core/src/main/resources/error/error-classes.json: ## @@ -695,6 +707,73 @@ } } }, + "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY" : { +"message" : [

[GitHub] [spark] allisonwang-db commented on a diff in pull request #37840: [SPARK-40394][SQL] Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-13 Thread GitBox
allisonwang-db commented on code in PR #37840: URL: https://github.com/apache/spark/pull/37840#discussion_r969877155 ## core/src/main/resources/error/error-classes.json: ## @@ -695,6 +707,73 @@ } } }, + "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY" : { +

[GitHub] [spark] peter-toth commented on pull request #37866: [SPARK-40362][SQL][3.3] Fix BinaryComparison canonicalization

2022-09-13 Thread GitBox
peter-toth commented on PR #37866: URL: https://github.com/apache/spark/pull/37866#issuecomment-1245693260 > The compilation is broken. Please check the code. Thanks @dongjoon-hyun for checking the logs,

[GitHub] [spark] ahshahid opened a new pull request, #37870: [SPARK-33152] [SQL] New algorithm for ConstraintsPropagation rule to solve the problem of performance & OOM if the query plans have large e

2022-09-13 Thread GitBox
ahshahid opened a new pull request, #37870: URL: https://github.com/apache/spark/pull/37870 ### What changes were proposed in this pull request? This PR proposes new algorithm to create & store the constraints. It tracks aliases in projection which eliminates the need of

[GitHub] [spark] LuciferYang commented on a diff in pull request #37868: [SPARK-40397][BUILD] Upgrade `org.scalatestplus:selenium` to 3.12.13

2022-09-13 Thread GitBox
LuciferYang commented on code in PR #37868: URL: https://github.com/apache/spark/pull/37868#discussion_r969831451 ## dev/deps/spark-deps-hadoop-2-hive-2.3: ## @@ -217,7 +217,7 @@ netty-transport-native-unix-common/4.1.80.Final//netty-transport-native-unix-com

[GitHub] [spark] LuciferYang commented on a diff in pull request #37868: [SPARK-40397][BUILD] Upgrade `org.scalatestplus:selenium` to 3.12.13

2022-09-13 Thread GitBox
LuciferYang commented on code in PR #37868: URL: https://github.com/apache/spark/pull/37868#discussion_r969828951 ## pom.xml: ## @@ -688,14 +689,26 @@ selenium-java ${selenium.version} test + + +org.seleniumhq.selenium +

[GitHub] [spark] LuciferYang commented on pull request #37867: [SPARK-40415][BUILD][K8S] Add explicit Maven dependency for okio

2022-09-13 Thread GitBox
LuciferYang commented on PR #37867: URL: https://github.com/apache/spark/pull/37867#issuecomment-1245631337 > According to K8s IT, it seems that there is no evidence of failures, @LuciferYang . In that case, I also prefer to close this. Got it, thanks @srowen @dongjoon-hyun --

[GitHub] [spark] LuciferYang closed pull request #37867: [SPARK-40415][BUILD][K8S] Add explicit Maven dependency for okio

2022-09-13 Thread GitBox
LuciferYang closed pull request #37867: [SPARK-40415][BUILD][K8S] Add explicit Maven dependency for okio URL: https://github.com/apache/spark/pull/37867 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #37867: [SPARK-40415][BUILD][K8S] Add explicit Maven dependency for okio

2022-09-13 Thread GitBox
dongjoon-hyun commented on PR #37867: URL: https://github.com/apache/spark/pull/37867#issuecomment-1245630128 According to K8s IT, it seems that there is no evidence of failures, @LuciferYang . In that case, I also prefer to close this. -- This is an automated message from the Apache

[GitHub] [spark] LuciferYang commented on a diff in pull request #37868: [SPARK-40397][BUILD] Upgrade `org.scalatestplus:selenium` to 3.12.13

2022-09-13 Thread GitBox
LuciferYang commented on code in PR #37868: URL: https://github.com/apache/spark/pull/37868#discussion_r969824236 ## dev/deps/spark-deps-hadoop-2-hive-2.3: ## @@ -217,7 +217,7 @@ netty-transport-native-unix-common/4.1.80.Final//netty-transport-native-unix-com

[GitHub] [spark] MaxGekk opened a new pull request, #37869: [WIP][SQL] Migrate type check fails in CAST to error classes

2022-09-13 Thread GitBox
MaxGekk opened a new pull request, #37869: URL: https://github.com/apache/spark/pull/37869 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] LuciferYang opened a new pull request, #37868: [SPARK-40397][BUILD] Upgrade `org.scalatestplus:selenium` to 3.12.13

2022-09-13 Thread GitBox
LuciferYang opened a new pull request, #37868: URL: https://github.com/apache/spark/pull/37868 ### What changes were proposed in this pull request? The main change of this pr as follows: - Upgrade `org.scalatestplus:selenium` from `org.scalatestplus:selenium-3-141:3.2.10.0` to

[GitHub] [spark] LuciferYang commented on pull request #37867: [SPARK-40415][BUILD][K8S] Add explicit Maven dependency for okio

2022-09-13 Thread GitBox
LuciferYang commented on PR #37867: URL: https://github.com/apache/spark/pull/37867#issuecomment-1245612332 > That's right, IIRC. sbt tends to favor newer versions, not nearest. Got it, iff there is no actual risk, agree not to make this change -- This is an automated

[GitHub] [spark] srowen commented on pull request #37867: [SPARK-40415][BUILD][K8S] Add explicit Maven dependency for okio

2022-09-13 Thread GitBox
srowen commented on PR #37867: URL: https://github.com/apache/spark/pull/37867#issuecomment-1245609120 That's right, IIRC. sbt tends to favor newer versions, not nearest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] LuciferYang commented on pull request #37867: [SPARK-40415][BUILD][K8S] Add explicit Maven dependency for okio

2022-09-13 Thread GitBox
LuciferYang commented on PR #37867: URL: https://github.com/apache/spark/pull/37867#issuecomment-1245606959 > Maven tends to adopt "nearest first" semantics Do you mean this may be because okio 1.15.0 is a level 4 dependency and okio 1.14.0 is a level 3 dependency, so 1.14.0 is used

[GitHub] [spark] LuciferYang commented on pull request #37867: [SPARK-40415][BUILD][K8S] Add explicit Maven dependency for okio

2022-09-13 Thread GitBox
LuciferYang commented on PR #37867: URL: https://github.com/apache/spark/pull/37867#issuecomment-1245598203 I found this issue when I was resolving another jira(upgrade org.scalatestplus:selenium). I think we need @dongjoon-hyun or @Yikun help to check whether there is a risk in using a

[GitHub] [spark] srowen commented on pull request #37867: [SPARK-40415][BUILD][K8S] Add explicit Maven dependency for okio

2022-09-13 Thread GitBox
srowen commented on PR #37867: URL: https://github.com/apache/spark/pull/37867#issuecomment-1245588571 Hm, does it cause a problem? Maven tends to adopt "nearest first" semantics, which can be right or wrong, but I don't think we'd explicitly manage it if it's working. -- This is an

[GitHub] [spark] LuciferYang commented on pull request #37867: [SPARK-40415][BUILD] Add explicit Maven dependency for okio

2022-09-13 Thread GitBox
LuciferYang commented on PR #37867: URL: https://github.com/apache/spark/pull/37867#issuecomment-1245583851 Explicit definition of okio 1.15.0 or okhttp3 3.12.12 can both achieve the goal, which is better? cc @dongjoon-hyun @Yikun @srowen for help -- This is an automated message

[GitHub] [spark] LuciferYang opened a new pull request, #37867: [SPARK-40415][BUILD] Add explicit Maven dependency for okio

2022-09-13 Thread GitBox
LuciferYang opened a new pull request, #37867: URL: https://github.com/apache/spark/pull/37867 ### What changes were proposed in this pull request? There are two places in Spark that depend on okio ``` [INFO] +- org.seleniumhq.selenium:selenium-java:jar:3.141.59:test ...

[GitHub] [spark] peter-toth commented on pull request #37851: [SPARK-40362][SQL] Fix BinaryComparison canonicalization

2022-09-13 Thread GitBox
peter-toth commented on PR #37851: URL: https://github.com/apache/spark/pull/37851#issuecomment-1245547673 > thanks, merging to master! Thanks for the review! > @peter-toth can you open a backport PR for 3.3? it has conflicts. Thanks! Opened here:

[GitHub] [spark] Yikun commented on pull request #37828: [SPARK-40384][INFRA] Only do base image real in time build when infra dockerfile is changed

2022-09-13 Thread GitBox
Yikun commented on PR #37828: URL: https://github.com/apache/spark/pull/37828#issuecomment-1245547356 - Test on only dockerfile changes: base trigger and only lint job triggered based on base image, need to enable pyspark/sparkr in followup https://github.com/apache/spark/pull/37865. -

[GitHub] [spark] peter-toth opened a new pull request, #37866: [SPARK-40362][SQL][3.3] Fix BinaryComparison canonicalization

2022-09-13 Thread GitBox
peter-toth opened a new pull request, #37866: URL: https://github.com/apache/spark/pull/37866 ### What changes were proposed in this pull request? Change canonicalization to a one pass process and move logic from `Canonicalize.reorderCommutativeOperators` to the respective commutative

[GitHub] [spark] cloud-fan commented on pull request #37851: [SPARK-40362][SQL] Fix BinaryComparison canonicalization

2022-09-13 Thread GitBox
cloud-fan commented on PR #37851: URL: https://github.com/apache/spark/pull/37851#issuecomment-1245531922 @peter-toth can you open a backport PR for 3.3? it has conflicts. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on pull request #37851: [SPARK-40362][SQL] Fix BinaryComparison canonicalization

2022-09-13 Thread GitBox
cloud-fan commented on PR #37851: URL: https://github.com/apache/spark/pull/37851#issuecomment-1245531493 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #37851: [SPARK-40362][SQL] Fix BinaryComparison canonicalization

2022-09-13 Thread GitBox
cloud-fan closed pull request #37851: [SPARK-40362][SQL] Fix BinaryComparison canonicalization URL: https://github.com/apache/spark/pull/37851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Yikun closed pull request #37815: [SPARK-40366][INFRA] Add `spark` namespace to spark ci image

2022-09-13 Thread GitBox
Yikun closed pull request #37815: [SPARK-40366][INFRA] Add `spark` namespace to spark ci image URL: https://github.com/apache/spark/pull/37815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Yikun commented on pull request #37865: [SPARK-40384][INFRA][FOLLOWUP] Also trigger PySpark and SparkR job when changing dockerfile

2022-09-13 Thread GitBox
Yikun commented on PR #37865: URL: https://github.com/apache/spark/pull/37865#issuecomment-1245510679 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] Yikun opened a new pull request, #37865: [SPARK-40384][INFRA][FOLLOWUP] Also trigger PySpark and SparkR job when changing dockerfile

2022-09-13 Thread GitBox
Yikun opened a new pull request, #37865: URL: https://github.com/apache/spark/pull/37865 ### What changes were proposed in this pull request? We should also trigger PySpark and SparkR job when changing dockerfile. ### Why are the changes needed? We should also trigger PySpark

[GitHub] [spark] MaxGekk closed pull request #37834: [SPARK-40400][SQL] Pass error message parameters to exceptions as maps

2022-09-13 Thread GitBox
MaxGekk closed pull request #37834: [SPARK-40400][SQL] Pass error message parameters to exceptions as maps URL: https://github.com/apache/spark/pull/37834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #37834: [SPARK-40400][SQL] Pass error message parameters to exceptions as maps

2022-09-13 Thread GitBox
MaxGekk commented on PR #37834: URL: https://github.com/apache/spark/pull/37834#issuecomment-1245491727 Merging to master. Thank you, @srielau @gengliangwang @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on a diff in pull request #37851: [SPARK-40362][SQL] Fix BinaryComparison canonicalization

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37851: URL: https://github.com/apache/spark/pull/37851#discussion_r969688531 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala: ## @@ -1176,3 +1161,21 @@ trait ComplexTypeMergingExpression extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #36027: [SPARK-38717][SQL] Handle Hive's bucket spec case preserving behaviour

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #36027: URL: https://github.com/apache/spark/pull/36027#discussion_r969667603 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1095,7 +1095,11 @@ private[hive] object HiveClientImpl extends Logging {

[GitHub] [spark] cloud-fan commented on a diff in pull request #37830: [SPARK-40387][SQL] Improve the implementation of Spark Decimal

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37830: URL: https://github.com/apache/spark/pull/37830#discussion_r969665012 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -240,9 +240,11 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969660489 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1378,28 +1378,84 @@ case class Pivot( * A constructor

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969659692 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1378,28 +1378,84 @@ case class Pivot( * A constructor

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969645973 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1378,28 +1378,84 @@ case class Pivot( * A constructor

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969644167 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1378,28 +1378,84 @@ case class Pivot( * A constructor

[GitHub] [spark] HyukjinKwon commented on pull request #37836: [SPARK-40339][SPARK-40342][SPARK-40345][SPARK-40348][PS] Implement quantile in Rolling/RollingGroupby/Expanding/ExpandingGroupby

2022-09-13 Thread GitBox
HyukjinKwon commented on PR #37836: URL: https://github.com/apache/spark/pull/37836#issuecomment-1245441448 cc @zhengruifeng @xinrong-meng @itholic FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37864: [SPARK-40414][SQL][PYSPARK] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
HyukjinKwon commented on code in PR #37864: URL: https://github.com/apache/spark/pull/37864#discussion_r969651811 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonArrowInput.scala: ## @@ -26,21 +26,30 @@ import org.apache.spark.{SparkEnv, TaskContext}

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37864: [SPARK-40414][SQL][PYSPARK] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
HyukjinKwon commented on code in PR #37864: URL: https://github.com/apache/spark/pull/37864#discussion_r969649861 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonArrowOutput.scala: ## @@ -33,12 +33,14 @@ import

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969645973 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1378,28 +1378,84 @@ case class Pivot( * A constructor

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969645973 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1378,28 +1378,84 @@ case class Pivot( * A constructor

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969644167 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1378,28 +1378,84 @@ case class Pivot( * A constructor

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969636837 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -618,6 +618,46 @@ pivotValue : expression (AS? identifier)? ;

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969633374 ## python/pyspark/sql/dataframe.py: ## @@ -3082,6 +3082,13 @@ def unpivot( When no "id" columns are given, the unpivoted DataFrame consists of only the

[GitHub] [spark] HyukjinKwon closed pull request #37828: [SPARK-40384][INFRA] Only do base image real in time build when infra dockerfile is changed

2022-09-13 Thread GitBox
HyukjinKwon closed pull request #37828: [SPARK-40384][INFRA] Only do base image real in time build when infra dockerfile is changed URL: https://github.com/apache/spark/pull/37828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969630564 ## python/pyspark/sql/dataframe.py: ## @@ -3064,7 +3064,7 @@ def cube(self, *cols: "ColumnOrName") -> "GroupedData": # type: ignore[misc] def unpivot(

[GitHub] [spark] cloud-fan commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

2022-09-13 Thread GitBox
cloud-fan commented on code in PR #37407: URL: https://github.com/apache/spark/pull/37407#discussion_r969627592 ## docs/sql-ref-syntax-qry-select-unpivot.md: ## @@ -0,0 +1,142 @@ +--- +layout: global +title: UNPIVOT Clause +displayTitle: UNPIVOT Clause +license: | + Licensed

[GitHub] [spark] LuciferYang commented on pull request #37609: [SPARK-40175][SQL]Speed up conversion of Tuple2 to Scala Map

2022-09-13 Thread GitBox
LuciferYang commented on PR #37609: URL: https://github.com/apache/spark/pull/37609#issuecomment-1245413079 > Great! Seems we can always do `while loop manually`? Yes, there should be many similar case and I think this should be a valuable optimization However, before I doing

[GitHub] [spark] HeartSaVioR commented on pull request #37864: [SPARK-40414][SQL][PYSPARK] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
HeartSaVioR commented on PR #37864: URL: https://github.com/apache/spark/pull/37864#issuecomment-1245396507 cc. @HyukjinKwon @ueshin Please take a look, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR commented on pull request #37864: [SPARK-40414][SQL][PYSPARK] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
HeartSaVioR commented on PR #37864: URL: https://github.com/apache/spark/pull/37864#issuecomment-1245394839 Here is a direct example of further usage:

[GitHub] [spark] ulysses-you commented on pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-09-13 Thread GitBox
ulysses-you commented on PR #37612: URL: https://github.com/apache/spark/pull/37612#issuecomment-1245393222 thank you @cloud-fan, updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR opened a new pull request, #37864: [SPARK-40414][SQL][PYSPARK] More generic type on PythonArrowInput and PythonArrowOutput

2022-09-13 Thread GitBox
HeartSaVioR opened a new pull request, #37864: URL: https://github.com/apache/spark/pull/37864 ### What changes were proposed in this pull request? This PR proposes to change PythonArrowInput and PythonArrowOutput to be more generic to cover the complex data type on both input and

[GitHub] [spark] HyukjinKwon commented on pull request #37828: [SPARK-40384][INFRA] Only do base image real in time build when infra dockerfile is changed

2022-09-13 Thread GitBox
HyukjinKwon commented on PR #37828: URL: https://github.com/apache/spark/pull/37828#issuecomment-1245390601 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

  1   2   >