[GitHub] [spark] zhengruifeng commented on a diff in pull request #37728: [SPARK-40276][CORE] Reduce the result size of RDD.takeOrdered

2022-09-01 Thread GitBox
zhengruifeng commented on code in PR #37728: URL: https://github.com/apache/spark/pull/37728#discussion_r960290810 ## core/src/main/scala/org/apache/spark/rdd/RDD.scala: ## @@ -1523,22 +1523,21 @@ abstract class RDD[T: ClassTag]( * @return an array of top elements */

[GitHub] [spark] dongjoon-hyun commented on pull request #37622: [SPARK-40187][DOCS] Add `Apache YuniKorn` scheduler docs

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37622: URL: https://github.com/apache/spark/pull/37622#issuecomment-1234034066 I created a test suite PR, @yangwwei . - https://github.com/apache/spark/pull/37753 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234063012 @dongjoon-hyun I just saw your recreate the spark repo, so might default permisson has some changes on Github Action? You could first set permission for your dongjoon-hyun/spark

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234097485 The potential issue might be you remove the old repo, but the images is not be deleted, then when create the new repo, the write permisson of this image are not configured to new repo.

[GitHub] [spark] cloud-fan commented on a diff in pull request #37741: [SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 and clean up expired rules

2022-09-01 Thread GitBox
cloud-fan commented on code in PR #37741: URL: https://github.com/apache/spark/pull/37741#discussion_r960247865 ## project/MimaExcludes.scala: ## @@ -118,79 +96,23 @@ object MimaExcludes { ProblemFilters.exclude[Problem]("org.apache.spark.sql.execution.*"),

[GitHub] [spark] LuciferYang commented on a diff in pull request #37741: [SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 and clean up expired rules

2022-09-01 Thread GitBox
LuciferYang commented on code in PR #37741: URL: https://github.com/apache/spark/pull/37741#discussion_r960272640 ## project/MimaExcludes.scala: ## @@ -118,79 +96,23 @@ object MimaExcludes { ProblemFilters.exclude[Problem]("org.apache.spark.sql.execution.*"),

[GitHub] [spark] MaxGekk commented on a diff in pull request #37746: [SPARK-40293][SQL] Make the V2 table error message more meaningful

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37746: URL: https://github.com/apache/spark/pull/37746#discussion_r960396868 ## core/src/main/resources/error/error-classes.json: ## @@ -520,6 +520,11 @@ "NATURAL CROSS JOIN." ] }, +

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234055469 Thank you, but `Base Image Build` phase failed three times already . https://user-images.githubusercontent.com/9700541/187888992-48c0292b-2586-421b-8f9e-9b514ab35cb2.png;>

[GitHub] [spark] dongjoon-hyun commented on pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37753: URL: https://github.com/apache/spark/pull/37753#issuecomment-1234062168 Could you review this, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on a diff in pull request #37741: [SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 and clean up expired rules

2022-09-01 Thread GitBox
LuciferYang commented on code in PR #37741: URL: https://github.com/apache/spark/pull/37741#discussion_r960298499 ## project/MimaExcludes.scala: ## @@ -118,79 +96,23 @@ object MimaExcludes { ProblemFilters.exclude[Problem]("org.apache.spark.sql.execution.*"),

[GitHub] [spark] MaxGekk commented on pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on PR #37744: URL: https://github.com/apache/spark/pull/37744#issuecomment-1233890597 also cc @srielau @anchovYu Could you take a look at the PR which introduces new error classes. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r960529530 ## python/pyspark/ml/executor_globals.py: ## @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] WeichenXu123 commented on pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
WeichenXu123 commented on PR #37734: URL: https://github.com/apache/spark/pull/37734#issuecomment-1234139636 But I think we'd better design and discuss the API first. @mengxr Do you have any suggestions ? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #37746: [SPARK-40293][SQL] Make the V2 table error message more meaningful

2022-09-01 Thread GitBox
cloud-fan commented on code in PR #37746: URL: https://github.com/apache/spark/pull/37746#discussion_r960285847 ## core/src/main/resources/error/error-classes.json: ## @@ -520,6 +520,11 @@ "NATURAL CROSS JOIN." ] }, +

[GitHub] [spark] mridulm commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-01 Thread GitBox
mridulm commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r960285819 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2242,60 +2251,57 @@ private[spark] class DAGScheduler( val numMergers =

[GitHub] [spark] zhengruifeng opened a new pull request, #37752: [SPARK-40301][PYTHON] Add parameter validations in pyspark.rdd

2022-09-01 Thread GitBox
zhengruifeng opened a new pull request, #37752: URL: https://github.com/apache/spark/pull/37752 ### What changes were proposed in this pull request? compared with the scala side, some parameter validations were missing in `pyspark.rdd` ### Why are the changes needed? add

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234067056 https://github.com/dongjoon-hyun/spark/settings/actions ![image](https://user-images.githubusercontent.com/1736354/187890839-2f26ce10-2e20-4d7e-ab6e-311c898fc416.png) -- This

[GitHub] [spark] LuciferYang commented on a diff in pull request #37741: [SPARK-40283][INFRA] Bump MiMa's previousSparkVersion to 3.3.0 and clean up expired rules

2022-09-01 Thread GitBox
LuciferYang commented on code in PR #37741: URL: https://github.com/apache/spark/pull/37741#discussion_r960330730 ## project/MimaExcludes.scala: ## @@ -118,79 +96,23 @@ object MimaExcludes { ProblemFilters.exclude[Problem]("org.apache.spark.sql.execution.*"),

[GitHub] [spark] dongjoon-hyun opened a new pull request, #37753: [SPARK-40302][K8S][TESTS] Add YuniKornSuite

2022-09-01 Thread GitBox
dongjoon-hyun opened a new pull request, #37753: URL: https://github.com/apache/spark/pull/37753 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] panbingkun opened a new pull request, #37754: [SPARK-39906][INFRA][FOLLOWGUP] Eliminate build warnings - sbt 0.13 hell syntax is deprecated; use slash syntax instead

2022-09-01 Thread GitBox
panbingkun opened a new pull request, #37754: URL: https://github.com/apache/spark/pull/37754 ### What changes were proposed in this pull request? The Pr is following https://github.com/apache/spark/pull/37326 when I run BLASBenchmark on github, found **The following warnings are

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234014605 The failure is irrelevant to this PR. It seems that the base image publishing is broken again. cc @Yikun ``` #33 ERROR: failed commit on ref

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234041557 @dongjoon-hyun Thanks to ping me, this due to github action ghcr unstable, you could retry to make it work. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] MaxGekk commented on pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on PR #37744: URL: https://github.com/apache/spark/pull/37744#issuecomment-1234048716 The test failure is not related to this PR, I believe: ``` YarnClusterSuite.run Spark in yarn-client mode with different configurations, ensuring redaction ``` @cloud-fan

[GitHub] [spark] MaxGekk commented on a diff in pull request #37742: [SPARK-40291][SQL] Improve the message for column not in group by clause error

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37742: URL: https://github.com/apache/spark/pull/37742#discussion_r960464975 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2527,4 +2527,11 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] sunchao commented on a diff in pull request #37749: [SPARK-40295][SQL] Allow v2 functions with literal args in write distribution/ordering

2022-09-01 Thread GitBox
sunchao commented on code in PR #37749: URL: https://github.com/apache/spark/pull/37749#discussion_r96023 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala: ## @@ -105,18 +105,27 @@ object V2ExpressionUtils extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #37746: [SPARK-40293][SQL] Make the V2 table error message more meaningful

2022-09-01 Thread GitBox
cloud-fan commented on code in PR #37746: URL: https://github.com/apache/spark/pull/37746#discussion_r960284889 ## core/src/main/resources/error/error-classes.json: ## @@ -520,6 +520,11 @@ "NATURAL CROSS JOIN." ] }, +

[GitHub] [spark] cloud-fan commented on a diff in pull request #37746: [SPARK-40293][SQL] Make the V2 table error message more meaningful

2022-09-01 Thread GitBox
cloud-fan commented on code in PR #37746: URL: https://github.com/apache/spark/pull/37746#discussion_r960285081 ## core/src/main/resources/error/error-classes.json: ## @@ -520,6 +520,11 @@ "NATURAL CROSS JOIN." ] }, +

[GitHub] [spark] dongjoon-hyun opened a new pull request, #37755: [SPARK-40304][K8S][TESTS] Add decomTestTag to K8s Integration Test

2022-09-01 Thread GitBox
dongjoon-hyun opened a new pull request, #37755: URL: https://github.com/apache/spark/pull/37755 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] dongjoon-hyun commented on pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37755: URL: https://github.com/apache/spark/pull/37755#issuecomment-1234068985 Could you review this please, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234077478 It's weird. IIRC, I didn't change anything from my previous repo either when your PR applied this change. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] mridulm commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-01 Thread GitBox
mridulm commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r960287381 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2242,60 +2251,57 @@ private[spark] class DAGScheduler( val numMergers =

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234071272 I'm already allowing all of them. https://user-images.githubusercontent.com/9700541/187891526-5938feb5-d380-4574-a81a-9b621779dead.png;> -- This is an automated message

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r960524940 ## python/pyspark/ml/functions.py: ## @@ -106,6 +112,170 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] zero323 commented on a diff in pull request #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

2022-09-01 Thread GitBox
zero323 commented on code in PR #37748: URL: https://github.com/apache/spark/pull/37748#discussion_r960525054 ## python/pyspark/sql/functions.py: ## @@ -108,12 +108,13 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: Any) -> Column: Invokes binary JVM math

[GitHub] [spark] dongjoon-hyun commented on pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37753: URL: https://github.com/apache/spark/pull/37753#issuecomment-1234510668 Thank you, @viirya . Yes, there was irrelevant `Base Image Build` job failure and corresponding PySpark UT failures. I re-triggered.

[GitHub] [spark] dongjoon-hyun commented on pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37755: URL: https://github.com/apache/spark/pull/37755#issuecomment-1234520875 Merged to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960901497 ## core/src/main/resources/error/error-classes.json: ## @@ -75,6 +75,23 @@ "The value () cannot be converted to because it is malformed. Correct the value as

[GitHub] [spark] santosh-d3vpl3x opened a new pull request, #37761: Add withColumnsRenamed to scala API of spark

2022-09-01 Thread GitBox
santosh-d3vpl3x opened a new pull request, #37761: URL: https://github.com/apache/spark/pull/37761 ### What changes were proposed in this pull request? This change adds an ability for code to rename multiple columns in a single call. ```scala withColumnsRenamed(colsMap:

[GitHub] [spark] dongjoon-hyun closed pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread GitBox
dongjoon-hyun closed pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test URL: https://github.com/apache/spark/pull/37755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960893478 ## core/src/main/resources/error/error-classes.json: ## @@ -75,6 +75,23 @@ "The value () cannot be converted to because it is malformed. Correct the value as

[GitHub] [spark] amaliujia commented on pull request #37750: [SPARK-40296] Error class for DISTINCT function not found

2022-09-01 Thread GitBox
amaliujia commented on PR #37750: URL: https://github.com/apache/spark/pull/37750#issuecomment-1234597166 R @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen commented on pull request #37731: [SPARK-40279][DOC] Document spark.yarn.report.interval

2022-09-01 Thread GitBox
srowen commented on PR #37731: URL: https://github.com/apache/spark/pull/37731#issuecomment-1234285256 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen commented on a diff in pull request #37700: [SPARK-40251][BUILD][MLLIB] Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2 & breeze from 2.0 to 2.1.0

2022-09-01 Thread GitBox
srowen commented on code in PR #37700: URL: https://github.com/apache/spark/pull/37700#discussion_r960657137 ## mllib-local/pom.xml: ## @@ -61,6 +61,11 @@ org.apache.spark spark-tags_${scala.binary.version} + Review Comment: Oh, I don't think we want

[GitHub] [spark] srowen closed pull request #37731: [SPARK-40279][DOC] Document spark.yarn.report.interval

2022-09-01 Thread GitBox
srowen closed pull request #37731: [SPARK-40279][DOC] Document spark.yarn.report.interval URL: https://github.com/apache/spark/pull/37731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread GitBox
cloud-fan commented on PR #37751: URL: https://github.com/apache/spark/pull/37751#issuecomment-1234336628 It has conflicts in 3.3, due to missing https://github.com/apache/spark/pull/36146 . @peter-toth can you help to open a backport PR for it? Thanks! -- This is an automated message

[GitHub] [spark] LuciferYang commented on a diff in pull request #37700: [SPARK-40251][BUILD][MLLIB] Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2 & breeze from 2.0 to 2.1.0

2022-09-01 Thread GitBox
LuciferYang commented on code in PR #37700: URL: https://github.com/apache/spark/pull/37700#discussion_r960736796 ## mllib-local/pom.xml: ## @@ -61,6 +61,11 @@ org.apache.spark spark-tags_${scala.binary.version} + Review Comment: > We can't do it for

[GitHub] [spark] wankunde opened a new pull request, #37759: [SPARK-40306][SQL]Support more than Integer.MAX_VALUE of the same join key

2022-09-01 Thread GitBox
wankunde opened a new pull request, #37759: URL: https://github.com/apache/spark/pull/37759 ### What changes were proposed in this pull request? Support more than Integer.MAX_VALUE of the same join key. ### Why are the changes needed? For SMJ, the number of

[GitHub] [spark] dongjoon-hyun closed pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread GitBox
dongjoon-hyun closed pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite` URL: https://github.com/apache/spark/pull/37753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] viirya commented on pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread GitBox
viirya commented on PR #37755: URL: https://github.com/apache/spark/pull/37755#issuecomment-1234517226 Some CI tasks were not finished normally. Seems unrelated, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960894747 ## core/src/main/resources/error/error-classes.json: ## @@ -75,6 +75,23 @@ "The value () cannot be converted to because it is malformed. Correct the value as

[GitHub] [spark] yangwwei commented on pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread GitBox
yangwwei commented on PR #37753: URL: https://github.com/apache/spark/pull/37753#issuecomment-1234638214 Very nice, thank you @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r960986473 ## python/pyspark/ml/functions.py: ## @@ -106,6 +112,170 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] aokolnychyi commented on a diff in pull request #37749: [SPARK-40295][SQL] Allow v2 functions with literal args in write distribution/ordering

2022-09-01 Thread GitBox
aokolnychyi commented on code in PR #37749: URL: https://github.com/apache/spark/pull/37749#discussion_r961043795 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala: ## @@ -105,18 +105,27 @@ object V2ExpressionUtils extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on code in PR #37745: URL: https://github.com/apache/spark/pull/37745#discussion_r960841601 ## hadoop-cloud/pom.xml: ## @@ -135,6 +135,18 @@ + + com.google.cloud.bigdataoss + gcs-connector +

[GitHub] [spark] MaxGekk commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960893478 ## core/src/main/resources/error/error-classes.json: ## @@ -75,6 +75,23 @@ "The value () cannot be converted to because it is malformed. Correct the value as

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r960972099 ## python/pyspark/ml/executor_globals.py: ## @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] steveloughran commented on a diff in pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
steveloughran commented on code in PR #37745: URL: https://github.com/apache/spark/pull/37745#discussion_r960971095 ## hadoop-cloud/pom.xml: ## @@ -135,6 +135,18 @@ + + com.google.cloud.bigdataoss + gcs-connector +

[GitHub] [spark] bjornjorgensen opened a new pull request, #37762: SPARK-39996[BUILD] Upgrade 'postgresql' to 42.5.0

2022-09-01 Thread GitBox
bjornjorgensen opened a new pull request, #37762: URL: https://github.com/apache/spark/pull/37762 ### What changes were proposed in this pull request? Upgrade 'postgresql' 42.3.3 to 42.5.0 ### Why are the changes needed? fix:

[GitHub] [spark] yangwwei commented on pull request #37622: [SPARK-40187][DOCS] Add `Apache YuniKorn` scheduler docs

2022-09-01 Thread GitBox
yangwwei commented on PR #37622: URL: https://github.com/apache/spark/pull/37622#issuecomment-1234636614 Hi, @dongjoon-hyun thanks a lot for helping on this. This is a great community collaboration between YuniKorn and Spark, thank you so much! -- This is an automated message from

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234507890 Thank you so much, @Yikun . Now, it seems to work on my three PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun commented on pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37753: URL: https://github.com/apache/spark/pull/37753#issuecomment-1234513022 Merged to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37755: URL: https://github.com/apache/spark/pull/37755#issuecomment-1234519012 Thank you, @viirya . Yes, it was the same `Base Image Build` failure. After re-triggering, it succeeds and now linter is running.

[GitHub] [spark] sigmod commented on a diff in pull request #37697: [SPARK-40248][SQL] Use larger number of bits to build Bloom filter

2022-09-01 Thread GitBox
sigmod commented on code in PR #37697: URL: https://github.com/apache/spark/pull/37697#discussion_r960939857 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala: ## @@ -55,6 +55,13 @@ case class BloomFilterAggregate(

[GitHub] [spark] zero323 commented on a diff in pull request #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

2022-09-01 Thread GitBox
zero323 commented on code in PR #37748: URL: https://github.com/apache/spark/pull/37748#discussion_r960952519 ## python/pyspark/sql/functions.py: ## @@ -108,13 +108,10 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: Any) -> Column: Invokes binary JVM math

[GitHub] [spark] MaxGekk commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960982999 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala: ## @@ -47,14 +47,23 @@ class ExpressionTypeCheckingSuite

[GitHub] [spark] wankunde commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-01 Thread GitBox
wankunde commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r960777861 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2242,60 +2251,57 @@ private[spark] class DAGScheduler( val numMergers =

[GitHub] [spark] cloud-fan closed pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread GitBox
cloud-fan closed pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved URL: https://github.com/apache/spark/pull/37751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] peter-toth commented on pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread GitBox
peter-toth commented on PR #37751: URL: https://github.com/apache/spark/pull/37751#issuecomment-1234357072 > It has conflicts in 3.3, due to missing #36146 . @peter-toth can you help to open a backport PR for it? Thanks! Sure, I can open it today. -- This is an automated message

[GitHub] [spark] srowen commented on a diff in pull request #37700: [SPARK-40251][BUILD][MLLIB] Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2 & breeze from 2.0 to 2.1.0

2022-09-01 Thread GitBox
srowen commented on code in PR #37700: URL: https://github.com/apache/spark/pull/37700#discussion_r960734725 ## mllib-local/pom.xml: ## @@ -61,6 +61,11 @@ org.apache.spark spark-tags_${scala.binary.version} + Review Comment: We can't do it for that.

[GitHub] [spark] peter-toth commented on pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread GitBox
peter-toth commented on PR #37751: URL: https://github.com/apache/spark/pull/37751#issuecomment-1234422501 @cloud-fan, here it is: https://github.com/apache/spark/pull/37760 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] peter-toth opened a new pull request, #37760: [SPARK-38404][SQL][3.3] Improve CTE resolution when a nested CTE references an outer CTE

2022-09-01 Thread GitBox
peter-toth opened a new pull request, #37760: URL: https://github.com/apache/spark/pull/37760 ### What changes were proposed in this pull request? Please note that the bug in the [SPARK-38404](https://issues.apache.org/jira/browse/SPARK-38404) is fixed already with

[GitHub] [spark] peter-toth commented on pull request #37760: [SPARK-38404][SQL][3.3] Improve CTE resolution when a nested CTE references an outer CTE

2022-09-01 Thread GitBox
peter-toth commented on PR #37760: URL: https://github.com/apache/spark/pull/37760#issuecomment-1234423325 This backport is needed for https://github.com/apache/spark/pull/37751#issuecomment-1234336628 cc @cloud-fan -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234388728 1. I checked that mine is the same with you. https://user-images.githubusercontent.com/9700541/187943970-bd5d40bf-8545-4d50-b7eb-16fc4a0440d8.png;> 2. Let me try to clean

[GitHub] [spark] cloud-fan opened a new pull request, #37758: [SPARK-40149][SQL] Propagate metadata columns through Project

2022-09-01 Thread GitBox
cloud-fan opened a new pull request, #37758: URL: https://github.com/apache/spark/pull/37758 ### What changes were proposed in this pull request? This PR fixes a regression caused by https://github.com/apache/spark/pull/32017 . In

[GitHub] [spark] cloud-fan commented on pull request #37758: [SPARK-40149][SQL] Propagate metadata columns through Project

2022-09-01 Thread GitBox
cloud-fan commented on PR #37758: URL: https://github.com/apache/spark/pull/37758#issuecomment-1234209788 cc @karenfeng @viirya @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #37739: [SPARK-40265][PS] Fix the inconsistent behavior for Index.intersection.

2022-09-01 Thread GitBox
zhengruifeng commented on PR #37739: URL: https://github.com/apache/spark/pull/37739#issuecomment-1234230281 what if the `psidx` itself is a `MultiIndex`? ``` >>> psidx Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas') >>> psidx.intersection([(1, 2), (3,

[GitHub] [spark] zhengruifeng opened a new pull request, #37756: [SPARK-40305][PS] Implement Groupby.sem

2022-09-01 Thread GitBox
zhengruifeng opened a new pull request, #37756: URL: https://github.com/apache/spark/pull/37756 ### What changes were proposed in this pull request? Implement Groupby.sem ### Why are the changes needed? to increase API coverage ### Does this PR introduce _any_

[GitHub] [spark] Ngone51 commented on a diff in pull request #37411: [SPARK-39984][CORE] Check workerLastHeartbeat with master before HeartbeatReceiver expires an executor

2022-09-01 Thread GitBox
Ngone51 commented on code in PR #37411: URL: https://github.com/apache/spark/pull/37411#discussion_r960646158 ## core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala: ## @@ -77,17 +77,61 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)

[GitHub] [spark] cloud-fan commented on pull request #37750: [SPARK-40296] Error class for DISTINCT function not found

2022-09-01 Thread GitBox
cloud-fan commented on PR #37750: URL: https://github.com/apache/spark/pull/37750#issuecomment-1234297015 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] gitlabsam opened a new pull request, #37757: Branch 3.3 sam

2022-09-01 Thread GitBox
gitlabsam opened a new pull request, #37757: URL: https://github.com/apache/spark/pull/37757 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] panbingkun commented on a diff in pull request #37700: [SPARK-40251][BUILD][MLLIB] Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2 & breeze from 2.0 to 2.1.0

2022-09-01 Thread GitBox
panbingkun commented on code in PR #37700: URL: https://github.com/apache/spark/pull/37700#discussion_r960672738 ## mllib-local/pom.xml: ## @@ -61,6 +61,11 @@ org.apache.spark spark-tags_${scala.binary.version} + Review Comment: for

[GitHub] [spark] cloud-fan commented on pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread GitBox
cloud-fan commented on PR #37751: URL: https://github.com/apache/spark/pull/37751#issuecomment-1234333082 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srielau commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
srielau commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960719687 ## core/src/main/resources/error/error-classes.json: ## @@ -75,6 +75,23 @@ "The value () cannot be converted to because it is malformed. Correct the value as

[GitHub] [spark] wangyum commented on a diff in pull request #37697: [SPARK-40248][SQL] Use larger number of bits to build Bloom filter

2022-09-01 Thread GitBox
wangyum commented on code in PR #37697: URL: https://github.com/apache/spark/pull/37697#discussion_r960736515 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala: ## @@ -55,6 +55,13 @@ case class BloomFilterAggregate(

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234426066 https://github.com/users/dongjoon-hyun/packages/container/apache-spark-ci-image/settings You can also remove it in page ^ -- This is an automated message from the Apache Git

[GitHub] [spark] wangyum commented on a diff in pull request #37759: [SPARK-40306][SQL]Support more than Integer.MAX_VALUE of the same join key

2022-09-01 Thread GitBox
wangyum commented on code in PR #37759: URL: https://github.com/apache/spark/pull/37759#discussion_r960788909 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala: ## @@ -76,15 +76,15 @@ private[sql] class

[GitHub] [spark] sunchao commented on a diff in pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
sunchao commented on code in PR #37745: URL: https://github.com/apache/spark/pull/37745#discussion_r960782293 ## hadoop-cloud/pom.xml: ## @@ -135,6 +135,18 @@ + + com.google.cloud.bigdataoss + gcs-connector + ${gcs-connector.version} +

[GitHub] [spark] HyukjinKwon commented on pull request #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

2022-09-01 Thread GitBox
HyukjinKwon commented on PR #37748: URL: https://github.com/apache/spark/pull/37748#issuecomment-1233897381 cc @zero323 fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234081059 Could you check this link? https://github.com/users/dongjoon-hyun/packages/container/package/apache-spark-ci-image/settings

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r960972099 ## python/pyspark/ml/executor_globals.py: ## @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] lyssg commented on pull request #35667: [SPARK-38425][K8S] Avoid possible errors due to incorrect file size or type supplied in hadoop conf

2022-09-01 Thread GitBox
lyssg commented on PR #35667: URL: https://github.com/apache/spark/pull/35667#issuecomment-1234955160 @dongjoon-hyun , @martin-g , @ScrapCodes ,I have rebased my PR again. Could you take another look? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] xinrong-meng commented on a diff in pull request #37756: [SPARK-40305][PS] Implement Groupby.sem

2022-09-01 Thread GitBox
xinrong-meng commented on code in PR #37756: URL: https://github.com/apache/spark/pull/37756#discussion_r961275185 ## python/pyspark/pandas/generic.py: ## @@ -2189,7 +2189,7 @@ def std(psser: "Series") -> Column: return F.stddev_samp(spark_column)

[GitHub] [spark] xinrong-meng commented on a diff in pull request #37756: [SPARK-40305][PS] Implement Groupby.sem

2022-09-01 Thread GitBox
xinrong-meng commented on code in PR #37756: URL: https://github.com/apache/spark/pull/37756#discussion_r961275433 ## python/pyspark/pandas/tests/test_groupby.py: ## @@ -3055,6 +3065,17 @@ def test_ddof(self): psdf.groupby("a")["b"].var(ddof=ddof).sort_index(),

[GitHub] [spark] hgs19921112 closed pull request #37765: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex e

2022-09-01 Thread GitBox
hgs19921112 closed pull request #37765: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression. URL: https://github.com/apache/spark/pull/37765 -- This is an automated message from the Apache

[GitHub] [spark] hgs19921112 opened a new pull request, #37766: [SPARK-40288][SQL]After RemoveRedundantAggregates, PullOutGroupingExpressions should applied to avoid attribute missing when use complex

2022-09-01 Thread GitBox
hgs19921112 opened a new pull request, #37766: URL: https://github.com/apache/spark/pull/37766 ### What changes were proposed in this pull request? RemoveRedundantAggregates will cause reference attribute missing when using complex expression in group by. ### Why are

[GitHub] [spark] hgs19921112 commented on pull request #37766: [SPARK-40288][SQL]After RemoveRedundantAggregates, PullOutGroupingExpressions should applied to avoid attribute missing when use complex

2022-09-01 Thread GitBox
hgs19921112 commented on PR #37766: URL: https://github.com/apache/spark/pull/37766#issuecomment-1235057302 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] huaxingao commented on a diff in pull request #37746: [SPARK-40293][SQL] Make the V2 table error message more meaningful

2022-09-01 Thread GitBox
huaxingao commented on code in PR #37746: URL: https://github.com/apache/spark/pull/37746#discussion_r961312863 ## core/src/main/resources/error/error-classes.json: ## @@ -520,6 +520,11 @@ "NATURAL CROSS JOIN." ] }, +

[GitHub] [spark] huaxingao commented on a diff in pull request #37746: [SPARK-40293][SQL] Make the V2 table error message more meaningful

2022-09-01 Thread GitBox
huaxingao commented on code in PR #37746: URL: https://github.com/apache/spark/pull/37746#discussion_r961313022 ## core/src/main/resources/error/error-classes.json: ## @@ -520,6 +520,11 @@ "NATURAL CROSS JOIN." ] }, +

[GitHub] [spark] xinrong-meng commented on a diff in pull request #37756: [SPARK-40305][PS] Implement Groupby.sem

2022-09-01 Thread GitBox
xinrong-meng commented on code in PR #37756: URL: https://github.com/apache/spark/pull/37756#discussion_r961271308 ## python/pyspark/pandas/groupby.py: ## @@ -827,6 +827,76 @@ def mad(self) -> FrameLike: return self._prepare_return(DataFrame(internal)) +def

[GitHub] [spark] hgs19921112 commented on pull request #37765: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use comp

2022-09-01 Thread GitBox
hgs19921112 commented on PR #37765: URL: https://github.com/apache/spark/pull/37765#issuecomment-1235054014 @github-actions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

  1   2   >