[GitHub] [spark] huaxingao commented on a diff in pull request #37746: [SPARK-40293][SQL] Make the V2 table error message more meaningful

2022-09-01 Thread GitBox
huaxingao commented on code in PR #37746: URL: https://github.com/apache/spark/pull/37746#discussion_r961313022 ## core/src/main/resources/error/error-classes.json: ## @@ -520,6 +520,11 @@ "NATURAL CROSS JOIN." ] }, +

[GitHub] [spark] huaxingao commented on a diff in pull request #37746: [SPARK-40293][SQL] Make the V2 table error message more meaningful

2022-09-01 Thread GitBox
huaxingao commented on code in PR #37746: URL: https://github.com/apache/spark/pull/37746#discussion_r961312863 ## core/src/main/resources/error/error-classes.json: ## @@ -520,6 +520,11 @@ "NATURAL CROSS JOIN." ] }, +

[GitHub] [spark] zhengruifeng opened a new pull request, #37767: [SPARK-39284][FOLLOW] Add Groupby.mad to API references

2022-09-01 Thread GitBox
zhengruifeng opened a new pull request, #37767: URL: https://github.com/apache/spark/pull/37767 ### What changes were proposed in this pull request? Add `Groupby.mad` to API references ### Why are the changes needed? `Groupby.mad` was implemented in

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37756: [SPARK-40305][PS] Implement Groupby.sem

2022-09-01 Thread GitBox
zhengruifeng commented on code in PR #37756: URL: https://github.com/apache/spark/pull/37756#discussion_r961280658 ## python/pyspark/pandas/groupby.py: ## @@ -827,6 +827,76 @@ def mad(self) -> FrameLike: return self._prepare_return(DataFrame(internal)) +def

[GitHub] [spark] hgs19921112 commented on pull request #37766: [SPARK-40288][SQL]After RemoveRedundantAggregates, PullOutGroupingExpressions should applied to avoid attribute missing when use complex

2022-09-01 Thread GitBox
hgs19921112 commented on PR #37766: URL: https://github.com/apache/spark/pull/37766#issuecomment-1235057302 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hgs19921112 opened a new pull request, #37766: [SPARK-40288][SQL]After RemoveRedundantAggregates, PullOutGroupingExpressions should applied to avoid attribute missing when use complex

2022-09-01 Thread GitBox
hgs19921112 opened a new pull request, #37766: URL: https://github.com/apache/spark/pull/37766 ### What changes were proposed in this pull request? RemoveRedundantAggregates will cause reference attribute missing when using complex expression in group by. ### Why are

[GitHub] [spark] hgs19921112 closed pull request #37765: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex e

2022-09-01 Thread GitBox
hgs19921112 closed pull request #37765: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression. URL: https://github.com/apache/spark/pull/37765 -- This is an automated message from the Apache

[GitHub] [spark] hgs19921112 commented on pull request #37765: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use comp

2022-09-01 Thread GitBox
hgs19921112 commented on PR #37765: URL: https://github.com/apache/spark/pull/37765#issuecomment-1235054014 @github-actions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] xinrong-meng commented on a diff in pull request #37756: [SPARK-40305][PS] Implement Groupby.sem

2022-09-01 Thread GitBox
xinrong-meng commented on code in PR #37756: URL: https://github.com/apache/spark/pull/37756#discussion_r961275433 ## python/pyspark/pandas/tests/test_groupby.py: ## @@ -3055,6 +3065,17 @@ def test_ddof(self): psdf.groupby("a")["b"].var(ddof=ddof).sort_index(),

[GitHub] [spark] xinrong-meng commented on a diff in pull request #37756: [SPARK-40305][PS] Implement Groupby.sem

2022-09-01 Thread GitBox
xinrong-meng commented on code in PR #37756: URL: https://github.com/apache/spark/pull/37756#discussion_r961275185 ## python/pyspark/pandas/generic.py: ## @@ -2189,7 +2189,7 @@ def std(psser: "Series") -> Column: return F.stddev_samp(spark_column)

[GitHub] [spark] xinrong-meng commented on a diff in pull request #37756: [SPARK-40305][PS] Implement Groupby.sem

2022-09-01 Thread GitBox
xinrong-meng commented on code in PR #37756: URL: https://github.com/apache/spark/pull/37756#discussion_r961271308 ## python/pyspark/pandas/groupby.py: ## @@ -827,6 +827,76 @@ def mad(self) -> FrameLike: return self._prepare_return(DataFrame(internal)) +def

[GitHub] [spark] hgs19921112 opened a new pull request, #37765: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use com

2022-09-01 Thread GitBox
hgs19921112 opened a new pull request, #37765: URL: https://github.com/apache/spark/pull/37765 ### What changes were proposed in this pull request? RemoveRedundantAggregates will cause reference attribute missing when using complex expression in group by sentance. ###

[GitHub] [spark] zhengruifeng commented on pull request #37756: [SPARK-40305][PS] Implement Groupby.sem

2022-09-01 Thread GitBox
zhengruifeng commented on PR #37756: URL: https://github.com/apache/spark/pull/37756#issuecomment-1235010056 cc @itholic @xinrong-meng @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] aokolnychyi commented on a diff in pull request #37749: [SPARK-40295][SQL] Allow v2 functions with literal args in write distribution/ordering

2022-09-01 Thread GitBox
aokolnychyi commented on code in PR #37749: URL: https://github.com/apache/spark/pull/37749#discussion_r961236436 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala: ## @@ -105,18 +105,27 @@ object V2ExpressionUtils extends

[GitHub] [spark] panbingkun commented on a diff in pull request #37700: [SPARK-40251][BUILD][MLLIB] Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2 & breeze from 2.0 to 2.1.0

2022-09-01 Thread GitBox
panbingkun commented on code in PR #37700: URL: https://github.com/apache/spark/pull/37700#discussion_r961209155 ## mllib-local/pom.xml: ## @@ -61,6 +61,11 @@ org.apache.spark spark-tags_${scala.binary.version} + Review Comment: ok, I will replace it

[GitHub] [spark] aokolnychyi commented on a diff in pull request #37749: [SPARK-40295][SQL] Allow v2 functions with literal args in write distribution/ordering

2022-09-01 Thread GitBox
aokolnychyi commented on code in PR #37749: URL: https://github.com/apache/spark/pull/37749#discussion_r961211655 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala: ## @@ -105,18 +105,27 @@ object V2ExpressionUtils extends

[GitHub] [spark] panbingkun commented on a diff in pull request #37700: [SPARK-40251][BUILD][MLLIB] Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2 & breeze from 2.0 to 2.1.0

2022-09-01 Thread GitBox
panbingkun commented on code in PR #37700: URL: https://github.com/apache/spark/pull/37700#discussion_r961209155 ## mllib-local/pom.xml: ## @@ -61,6 +61,11 @@ org.apache.spark spark-tags_${scala.binary.version} + Review Comment: I will replace it

[GitHub] [spark] lyssg commented on pull request #35667: [SPARK-38425][K8S] Avoid possible errors due to incorrect file size or type supplied in hadoop conf

2022-09-01 Thread GitBox
lyssg commented on PR #35667: URL: https://github.com/apache/spark/pull/35667#issuecomment-1234955160 @dongjoon-hyun , @martin-g , @ScrapCodes ,I have rebased my PR again. Could you take another look? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r960972099 ## python/pyspark/ml/executor_globals.py: ## @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] gengliangwang opened a new pull request, #37764: [SPARK-40310][SQL] try_sum() should throw the exceptions from its child

2022-09-01 Thread GitBox
gengliangwang opened a new pull request, #37764: URL: https://github.com/apache/spark/pull/37764 ### What changes were proposed in this pull request? Similar to https://github.com/apache/spark/pull/37486 and https://github.com/apache/spark/pull/37663, this PR refactors the

[GitHub] [spark] bersprockets commented on a diff in pull request #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function

2022-09-01 Thread GitBox
bersprockets commented on code in PR #37763: URL: https://github.com/apache/spark/pull/37763#discussion_r961154793 ## sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala: ## @@ -606,6 +606,36 @@ class StringFunctionsSuite extends QueryTest with

[GitHub] [spark] bersprockets opened a new pull request, #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function

2022-09-01 Thread GitBox
bersprockets opened a new pull request, #37763: URL: https://github.com/apache/spark/pull/37763 ### What changes were proposed in this pull request? Remove the check for foldable delimiter arguments from `StringToMap#checkInputDataTypes`. Except for `checkInputDataTypes`,

[GitHub] [spark] dongjoon-hyun commented on pull request #37622: [SPARK-40187][DOCS] Add `Apache YuniKorn` scheduler docs

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37622: URL: https://github.com/apache/spark/pull/37622#issuecomment-1234811066 Thank YOU, @yangwwei . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234809633 This PR passed CI here. - https://github.com/dongjoon-hyun/spark/runs/8139684479?check_suite_focus=true -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234807199 Ur, wait. @steveloughran . It's Java 8, isn't it? https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/8453ce7ce7510e983bae7470909fbd02704c0539/pom.xml#L76-L77

[GitHub] [spark] dongjoon-hyun closed pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun closed pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module URL: https://github.com/apache/spark/pull/37745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234803915 Thank you for review, @steveloughran . > note that the gcs connector (at leasts the builds off their master) are java 11 only; not sure where that stands w.r.t older releases

[GitHub] [spark] linhongliu-db commented on pull request #37742: [SPARK-40291][SQL] Improve the message for column not in group by clause error

2022-09-01 Thread GitBox
linhongliu-db commented on PR #37742: URL: https://github.com/apache/spark/pull/37742#issuecomment-1234796980 @MaxGekk Thanks for reviewing! I updated the PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] aokolnychyi commented on a diff in pull request #37749: [SPARK-40295][SQL] Allow v2 functions with literal args in write distribution/ordering

2022-09-01 Thread GitBox
aokolnychyi commented on code in PR #37749: URL: https://github.com/apache/spark/pull/37749#discussion_r961043795 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala: ## @@ -105,18 +105,27 @@ object V2ExpressionUtils extends

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r960986473 ## python/pyspark/ml/functions.py: ## @@ -106,6 +112,170 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] MaxGekk commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960982999 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala: ## @@ -47,14 +47,23 @@ class ExpressionTypeCheckingSuite

[GitHub] [spark] yangwwei commented on pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread GitBox
yangwwei commented on PR #37753: URL: https://github.com/apache/spark/pull/37753#issuecomment-1234638214 Very nice, thank you @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] yangwwei commented on pull request #37622: [SPARK-40187][DOCS] Add `Apache YuniKorn` scheduler docs

2022-09-01 Thread GitBox
yangwwei commented on PR #37622: URL: https://github.com/apache/spark/pull/37622#issuecomment-1234636614 Hi, @dongjoon-hyun thanks a lot for helping on this. This is a great community collaboration between YuniKorn and Spark, thank you so much! -- This is an automated message from

[GitHub] [spark] bjornjorgensen opened a new pull request, #37762: SPARK-39996[BUILD] Upgrade 'postgresql' to 42.5.0

2022-09-01 Thread GitBox
bjornjorgensen opened a new pull request, #37762: URL: https://github.com/apache/spark/pull/37762 ### What changes were proposed in this pull request? Upgrade 'postgresql' 42.3.3 to 42.5.0 ### Why are the changes needed? fix:

[GitHub] [spark] steveloughran commented on a diff in pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
steveloughran commented on code in PR #37745: URL: https://github.com/apache/spark/pull/37745#discussion_r960971095 ## hadoop-cloud/pom.xml: ## @@ -135,6 +135,18 @@ + + com.google.cloud.bigdataoss + gcs-connector +

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r960972099 ## python/pyspark/ml/executor_globals.py: ## @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] zero323 commented on a diff in pull request #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

2022-09-01 Thread GitBox
zero323 commented on code in PR #37748: URL: https://github.com/apache/spark/pull/37748#discussion_r960952519 ## python/pyspark/sql/functions.py: ## @@ -108,13 +108,10 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: Any) -> Column: Invokes binary JVM math

[GitHub] [spark] amaliujia commented on pull request #37750: [SPARK-40296] Error class for DISTINCT function not found

2022-09-01 Thread GitBox
amaliujia commented on PR #37750: URL: https://github.com/apache/spark/pull/37750#issuecomment-1234597166 R @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] sigmod commented on a diff in pull request #37697: [SPARK-40248][SQL] Use larger number of bits to build Bloom filter

2022-09-01 Thread GitBox
sigmod commented on code in PR #37697: URL: https://github.com/apache/spark/pull/37697#discussion_r960939857 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala: ## @@ -55,6 +55,13 @@ case class BloomFilterAggregate(

[GitHub] [spark] santosh-d3vpl3x opened a new pull request, #37761: Add withColumnsRenamed to scala API of spark

2022-09-01 Thread GitBox
santosh-d3vpl3x opened a new pull request, #37761: URL: https://github.com/apache/spark/pull/37761 ### What changes were proposed in this pull request? This change adds an ability for code to rename multiple columns in a single call. ```scala withColumnsRenamed(colsMap:

[GitHub] [spark] MaxGekk commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960901497 ## core/src/main/resources/error/error-classes.json: ## @@ -75,6 +75,23 @@ "The value () cannot be converted to because it is malformed. Correct the value as

[GitHub] [spark] MaxGekk commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960893478 ## core/src/main/resources/error/error-classes.json: ## @@ -75,6 +75,23 @@ "The value () cannot be converted to because it is malformed. Correct the value as

[GitHub] [spark] MaxGekk commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960894747 ## core/src/main/resources/error/error-classes.json: ## @@ -75,6 +75,23 @@ "The value () cannot be converted to because it is malformed. Correct the value as

[GitHub] [spark] MaxGekk commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960893478 ## core/src/main/resources/error/error-classes.json: ## @@ -75,6 +75,23 @@ "The value () cannot be converted to because it is malformed. Correct the value as

[GitHub] [spark] dongjoon-hyun commented on pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37755: URL: https://github.com/apache/spark/pull/37755#issuecomment-1234520875 Merged to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread GitBox
dongjoon-hyun closed pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test URL: https://github.com/apache/spark/pull/37755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37755: URL: https://github.com/apache/spark/pull/37755#issuecomment-1234519012 Thank you, @viirya . Yes, it was the same `Base Image Build` failure. After re-triggering, it succeeds and now linter is running.

[GitHub] [spark] viirya commented on pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread GitBox
viirya commented on PR #37755: URL: https://github.com/apache/spark/pull/37755#issuecomment-1234517226 Some CI tasks were not finished normally. Seems unrelated, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37753: URL: https://github.com/apache/spark/pull/37753#issuecomment-1234513022 Merged to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread GitBox
dongjoon-hyun closed pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite` URL: https://github.com/apache/spark/pull/37753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37753: URL: https://github.com/apache/spark/pull/37753#issuecomment-1234510668 Thank you, @viirya . Yes, there was irrelevant `Base Image Build` job failure and corresponding PySpark UT failures. I re-triggered.

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234507890 Thank you so much, @Yikun . Now, it seems to work on my three PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on code in PR #37745: URL: https://github.com/apache/spark/pull/37745#discussion_r960841601 ## hadoop-cloud/pom.xml: ## @@ -135,6 +135,18 @@ + + com.google.cloud.bigdataoss + gcs-connector +

[GitHub] [spark] sunchao commented on a diff in pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
sunchao commented on code in PR #37745: URL: https://github.com/apache/spark/pull/37745#discussion_r960782293 ## hadoop-cloud/pom.xml: ## @@ -135,6 +135,18 @@ + + com.google.cloud.bigdataoss + gcs-connector + ${gcs-connector.version} +

[GitHub] [spark] wangyum commented on a diff in pull request #37759: [SPARK-40306][SQL]Support more than Integer.MAX_VALUE of the same join key

2022-09-01 Thread GitBox
wangyum commented on code in PR #37759: URL: https://github.com/apache/spark/pull/37759#discussion_r960788909 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala: ## @@ -76,15 +76,15 @@ private[sql] class

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234426066 https://github.com/users/dongjoon-hyun/packages/container/apache-spark-ci-image/settings You can also remove it in page ^ -- This is an automated message from the Apache Git

[GitHub] [spark] peter-toth commented on pull request #37760: [SPARK-38404][SQL][3.3] Improve CTE resolution when a nested CTE references an outer CTE

2022-09-01 Thread GitBox
peter-toth commented on PR #37760: URL: https://github.com/apache/spark/pull/37760#issuecomment-1234423325 This backport is needed for https://github.com/apache/spark/pull/37751#issuecomment-1234336628 cc @cloud-fan -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] peter-toth commented on pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread GitBox
peter-toth commented on PR #37751: URL: https://github.com/apache/spark/pull/37751#issuecomment-1234422501 @cloud-fan, here it is: https://github.com/apache/spark/pull/37760 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] peter-toth opened a new pull request, #37760: [SPARK-38404][SQL][3.3] Improve CTE resolution when a nested CTE references an outer CTE

2022-09-01 Thread GitBox
peter-toth opened a new pull request, #37760: URL: https://github.com/apache/spark/pull/37760 ### What changes were proposed in this pull request? Please note that the bug in the [SPARK-38404](https://issues.apache.org/jira/browse/SPARK-38404) is fixed already with

[GitHub] [spark] wankunde commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-01 Thread GitBox
wankunde commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r960777861 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2242,60 +2251,57 @@ private[spark] class DAGScheduler( val numMergers =

[GitHub] [spark] wankunde opened a new pull request, #37759: [SPARK-40306][SQL]Support more than Integer.MAX_VALUE of the same join key

2022-09-01 Thread GitBox
wankunde opened a new pull request, #37759: URL: https://github.com/apache/spark/pull/37759 ### What changes were proposed in this pull request? Support more than Integer.MAX_VALUE of the same join key. ### Why are the changes needed? For SMJ, the number of

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234388728 1. I checked that mine is the same with you. https://user-images.githubusercontent.com/9700541/187943970-bd5d40bf-8545-4d50-b7eb-16fc4a0440d8.png;> 2. Let me try to clean

[GitHub] [spark] LuciferYang commented on a diff in pull request #37700: [SPARK-40251][BUILD][MLLIB] Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2 & breeze from 2.0 to 2.1.0

2022-09-01 Thread GitBox
LuciferYang commented on code in PR #37700: URL: https://github.com/apache/spark/pull/37700#discussion_r960736796 ## mllib-local/pom.xml: ## @@ -61,6 +61,11 @@ org.apache.spark spark-tags_${scala.binary.version} + Review Comment: > We can't do it for

[GitHub] [spark] wangyum commented on a diff in pull request #37697: [SPARK-40248][SQL] Use larger number of bits to build Bloom filter

2022-09-01 Thread GitBox
wangyum commented on code in PR #37697: URL: https://github.com/apache/spark/pull/37697#discussion_r960736515 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala: ## @@ -55,6 +55,13 @@ case class BloomFilterAggregate(

[GitHub] [spark] srowen commented on a diff in pull request #37700: [SPARK-40251][BUILD][MLLIB] Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2 & breeze from 2.0 to 2.1.0

2022-09-01 Thread GitBox
srowen commented on code in PR #37700: URL: https://github.com/apache/spark/pull/37700#discussion_r960734725 ## mllib-local/pom.xml: ## @@ -61,6 +61,11 @@ org.apache.spark spark-tags_${scala.binary.version} + Review Comment: We can't do it for that.

[GitHub] [spark] peter-toth commented on pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread GitBox
peter-toth commented on PR #37751: URL: https://github.com/apache/spark/pull/37751#issuecomment-1234357072 > It has conflicts in 3.3, due to missing #36146 . @peter-toth can you help to open a backport PR for it? Thanks! Sure, I can open it today. -- This is an automated message

[GitHub] [spark] srielau commented on a diff in pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
srielau commented on code in PR #37744: URL: https://github.com/apache/spark/pull/37744#discussion_r960719687 ## core/src/main/resources/error/error-classes.json: ## @@ -75,6 +75,23 @@ "The value () cannot be converted to because it is malformed. Correct the value as

[GitHub] [spark] cloud-fan commented on pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread GitBox
cloud-fan commented on PR #37751: URL: https://github.com/apache/spark/pull/37751#issuecomment-1234336628 It has conflicts in 3.3, due to missing https://github.com/apache/spark/pull/36146 . @peter-toth can you help to open a backport PR for it? Thanks! -- This is an automated message

[GitHub] [spark] cloud-fan commented on pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread GitBox
cloud-fan commented on PR #37751: URL: https://github.com/apache/spark/pull/37751#issuecomment-1234333082 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread GitBox
cloud-fan closed pull request #37751: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved URL: https://github.com/apache/spark/pull/37751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] panbingkun commented on a diff in pull request #37700: [SPARK-40251][BUILD][MLLIB] Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2 & breeze from 2.0 to 2.1.0

2022-09-01 Thread GitBox
panbingkun commented on code in PR #37700: URL: https://github.com/apache/spark/pull/37700#discussion_r960672738 ## mllib-local/pom.xml: ## @@ -61,6 +61,11 @@ org.apache.spark spark-tags_${scala.binary.version} + Review Comment: for

[GitHub] [spark] cloud-fan commented on pull request #37750: [SPARK-40296] Error class for DISTINCT function not found

2022-09-01 Thread GitBox
cloud-fan commented on PR #37750: URL: https://github.com/apache/spark/pull/37750#issuecomment-1234297015 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #37731: [SPARK-40279][DOC] Document spark.yarn.report.interval

2022-09-01 Thread GitBox
srowen closed pull request #37731: [SPARK-40279][DOC] Document spark.yarn.report.interval URL: https://github.com/apache/spark/pull/37731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] srowen commented on pull request #37731: [SPARK-40279][DOC] Document spark.yarn.report.interval

2022-09-01 Thread GitBox
srowen commented on PR #37731: URL: https://github.com/apache/spark/pull/37731#issuecomment-1234285256 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen commented on a diff in pull request #37700: [SPARK-40251][BUILD][MLLIB] Upgrade dev.ludovic.netlib from 2.2.1 to 3.0.2 & breeze from 2.0 to 2.1.0

2022-09-01 Thread GitBox
srowen commented on code in PR #37700: URL: https://github.com/apache/spark/pull/37700#discussion_r960657137 ## mllib-local/pom.xml: ## @@ -61,6 +61,11 @@ org.apache.spark spark-tags_${scala.binary.version} + Review Comment: Oh, I don't think we want

[GitHub] [spark] Ngone51 commented on a diff in pull request #37411: [SPARK-39984][CORE] Check workerLastHeartbeat with master before HeartbeatReceiver expires an executor

2022-09-01 Thread GitBox
Ngone51 commented on code in PR #37411: URL: https://github.com/apache/spark/pull/37411#discussion_r960646158 ## core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala: ## @@ -77,17 +77,61 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)

[GitHub] [spark] zhengruifeng commented on pull request #37739: [SPARK-40265][PS] Fix the inconsistent behavior for Index.intersection.

2022-09-01 Thread GitBox
zhengruifeng commented on PR #37739: URL: https://github.com/apache/spark/pull/37739#issuecomment-1234230281 what if the `psidx` itself is a `MultiIndex`? ``` >>> psidx Int64Index([1, 2, 3, 4], dtype='int64', name='Koalas') >>> psidx.intersection([(1, 2), (3,

[GitHub] [spark] cloud-fan commented on pull request #37758: [SPARK-40149][SQL] Propagate metadata columns through Project

2022-09-01 Thread GitBox
cloud-fan commented on PR #37758: URL: https://github.com/apache/spark/pull/37758#issuecomment-1234209788 cc @karenfeng @viirya @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan opened a new pull request, #37758: [SPARK-40149][SQL] Propagate metadata columns through Project

2022-09-01 Thread GitBox
cloud-fan opened a new pull request, #37758: URL: https://github.com/apache/spark/pull/37758 ### What changes were proposed in this pull request? This PR fixes a regression caused by https://github.com/apache/spark/pull/32017 . In

[GitHub] [spark] gitlabsam opened a new pull request, #37757: Branch 3.3 sam

2022-09-01 Thread GitBox
gitlabsam opened a new pull request, #37757: URL: https://github.com/apache/spark/pull/37757 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] zhengruifeng opened a new pull request, #37756: [SPARK-40305][PS] Implement Groupby.sem

2022-09-01 Thread GitBox
zhengruifeng opened a new pull request, #37756: URL: https://github.com/apache/spark/pull/37756 ### What changes were proposed in this pull request? Implement Groupby.sem ### Why are the changes needed? to increase API coverage ### Does this PR introduce _any_

[GitHub] [spark] WeichenXu123 commented on pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
WeichenXu123 commented on PR #37734: URL: https://github.com/apache/spark/pull/37734#issuecomment-1234139636 But I think we'd better design and discuss the API first. @mengxr Do you have any suggestions ? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r960529530 ## python/pyspark/ml/executor_globals.py: ## @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] zero323 commented on a diff in pull request #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

2022-09-01 Thread GitBox
zero323 commented on code in PR #37748: URL: https://github.com/apache/spark/pull/37748#discussion_r960525054 ## python/pyspark/sql/functions.py: ## @@ -108,12 +108,13 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: Any) -> Column: Invokes binary JVM math

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-09-01 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r960524940 ## python/pyspark/ml/functions.py: ## @@ -106,6 +112,170 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234097485 The potential issue might be you remove the old repo, but the images is not be deleted, then when create the new repo, the write permisson of this image are not configured to new repo.

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234081059 Could you check this link? https://github.com/users/dongjoon-hyun/packages/container/package/apache-spark-ci-image/settings

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234077478 It's weird. IIRC, I didn't change anything from my previous repo either when your PR applied this change. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234071272 I'm already allowing all of them. https://user-images.githubusercontent.com/9700541/187891526-5938feb5-d380-4574-a81a-9b621779dead.png;> -- This is an automated message

[GitHub] [spark] dongjoon-hyun commented on pull request #37755: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37755: URL: https://github.com/apache/spark/pull/37755#issuecomment-1234068985 Could you review this please, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234067056 https://github.com/dongjoon-hyun/spark/settings/actions ![image](https://user-images.githubusercontent.com/1736354/187890839-2f26ce10-2e20-4d7e-ab6e-311c898fc416.png) -- This

[GitHub] [spark] dongjoon-hyun opened a new pull request, #37755: [SPARK-40304][K8S][TESTS] Add decomTestTag to K8s Integration Test

2022-09-01 Thread GitBox
dongjoon-hyun opened a new pull request, #37755: URL: https://github.com/apache/spark/pull/37755 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234063012 @dongjoon-hyun I just saw your recreate the spark repo, so might default permisson has some changes on Github Action? You could first set permission for your dongjoon-hyun/spark

[GitHub] [spark] dongjoon-hyun commented on pull request #37753: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37753: URL: https://github.com/apache/spark/pull/37753#issuecomment-1234062168 Could you review this, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #37742: [SPARK-40291][SQL] Improve the message for column not in group by clause error

2022-09-01 Thread GitBox
MaxGekk commented on code in PR #37742: URL: https://github.com/apache/spark/pull/37742#discussion_r960464975 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2527,4 +2527,11 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] dongjoon-hyun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234055469 Thank you, but `Base Image Build` phase failed three times already . https://user-images.githubusercontent.com/9700541/187888992-48c0292b-2586-421b-8f9e-9b514ab35cb2.png;>

[GitHub] [spark] MaxGekk commented on pull request #37744: [SPARK-40300][SQL] Migrate onto the `DATATYPE_MISMATCH` error class

2022-09-01 Thread GitBox
MaxGekk commented on PR #37744: URL: https://github.com/apache/spark/pull/37744#issuecomment-1234048716 The test failure is not related to this PR, I believe: ``` YarnClusterSuite.run Spark in yarn-client mode with different configurations, ensuring redaction ``` @cloud-fan

[GitHub] [spark] Yikun commented on pull request #37745: [SPARK-33605][BUILD] Add `gcs-connector` to `hadoop-cloud` module

2022-09-01 Thread GitBox
Yikun commented on PR #37745: URL: https://github.com/apache/spark/pull/37745#issuecomment-1234041557 @dongjoon-hyun Thanks to ping me, this due to github action ghcr unstable, you could retry to make it work. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on pull request #37622: [SPARK-40187][DOCS] Add `Apache YuniKorn` scheduler docs

2022-09-01 Thread GitBox
dongjoon-hyun commented on PR #37622: URL: https://github.com/apache/spark/pull/37622#issuecomment-1234034066 I created a test suite PR, @yangwwei . - https://github.com/apache/spark/pull/37753 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] panbingkun opened a new pull request, #37754: [SPARK-39906][INFRA][FOLLOWGUP] Eliminate build warnings - sbt 0.13 hell syntax is deprecated; use slash syntax instead

2022-09-01 Thread GitBox
panbingkun opened a new pull request, #37754: URL: https://github.com/apache/spark/pull/37754 ### What changes were proposed in this pull request? The Pr is following https://github.com/apache/spark/pull/37326 when I run BLASBenchmark on github, found **The following warnings are

  1   2   >