[GitHub] [spark] LuciferYang commented on a diff in pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pass

2023-03-13 Thread via GitHub
LuciferYang commented on code in PR #40395: URL: https://github.com/apache/spark/pull/40395#discussion_r1135005783 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala: ## @@ -130,9 +132,21 @@ class SQLImplicitsTestSuite extends

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134995656 ## mllib/core/src/main/scala/org/apache/spark/ml/param/shared/HasExecutionContext.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134995335 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -129,58 +129,25 @@ private[hive] class DeferredObjectAdapter(oi: ObjectInspector,

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134994989 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -129,58 +129,25 @@ private[hive] class DeferredObjectAdapter(oi: ObjectInspector,

[GitHub] [spark] LuciferYang commented on a diff in pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pass

2023-03-13 Thread via GitHub
LuciferYang commented on code in PR #40395: URL: https://github.com/apache/spark/pull/40395#discussion_r1134966350 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala: ## @@ -130,9 +132,21 @@ class SQLImplicitsTestSuite extends

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134961956 ## mllib/core/src/test/scala/org/apache/spark/ml/attribute/AttributeGroupSuite.scala: ## @@ -1,65 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134960384 ## mllib/core/src/test/scala/org/apache/spark/ml/attribute/AttributeGroupSuite.scala: ## @@ -1,65 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134959146 ## mllib/core/src/test/scala/org/apache/spark/ml/attribute/AttributeGroupSuite.scala: ## @@ -1,65 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134946826 ## mllib/core/src/main/scala/org/apache/spark/ml/param/shared/HasExecutionContext.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134945494 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pa

2023-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #40395: URL: https://github.com/apache/spark/pull/40395#discussion_r1134941041 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala: ## @@ -130,9 +131,15 @@ class SQLImplicitsTestSuite extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pa

2023-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #40395: URL: https://github.com/apache/spark/pull/40395#discussion_r1134938946 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala: ## @@ -130,9 +131,15 @@ class SQLImplicitsTestSuite extends

[GitHub] [spark] itholic commented on pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-13 Thread via GitHub
itholic commented on PR #40282: URL: https://github.com/apache/spark/pull/40282#issuecomment-1467350642 Reminder for @HyukjinKwon @srielau @MaxGekk for error class document for PySpark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] ueshin commented on pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-13 Thread via GitHub
ueshin commented on PR #40388: URL: https://github.com/apache/spark/pull/40388#issuecomment-1467349620 It's irrelevant means it's an issue, no? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] pan3793 commented on pull request #39160: [SPARK-41667][K8S] Expose env var SPARK_DRIVER_POD_NAME in Driver Pod

2023-03-13 Thread via GitHub
pan3793 commented on PR #39160: URL: https://github.com/apache/spark/pull/39160#issuecomment-1467341858 I found that [apple/batch-processing-gateway](https://github.com/apple/batch-processing-gateway) uses Pod Name to fetch the log as well

[GitHub] [spark] Stove-hust commented on pull request #40393: []SPARK-40082]

2023-03-13 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1467340408 > @Stove-hust Thank you for reporting and the patch. Would you be able to share driver logs? Sure(Add some comments) --- stage 10 faield 22/10/15 10:55:58 WARN

[GitHub] [spark] Stove-hust commented on pull request #40393: []SPARK-40082]

2023-03-13 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1467339828 > @Stove-hust Thank you for reporting and the patch. Would you be able to share driver logs? ** --- stage 10 faield 22/10/15 10:55:58 WARN task-result-getter-1

[GitHub] [spark] Stove-hust commented on pull request #40393: []SPARK-40082]

2023-03-13 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1467339346 > @Stove-hust Thank you for reporting and the patch. Would you be able to share driver logs? sure. `# stage 10 faield 22/10/15 10:55:58 WARN task-result-getter-1

[GitHub] [spark] ulysses-you opened a new pull request, #40406: [SPARK-42101][SQL][FOLLOWUP] Improve TableCacheQueryStage with CoalesceShufflePartitions

2023-03-13 Thread via GitHub
ulysses-you opened a new pull request, #40406: URL: https://github.com/apache/spark/pull/40406 ### What changes were proposed in this pull request? `CoalesceShufflePartitions` should make sure all leaves are `ExchangeQueryStageExec` to avoid collect `TableCacheQueryStage`. As

[GitHub] [spark] gatorsmile commented on pull request #40216: [SPARK-42593][PS] Deprecate & remove the APIs that will be removed in pandas 2.0.

2023-03-13 Thread via GitHub
gatorsmile commented on PR #40216: URL: https://github.com/apache/spark/pull/40216#issuecomment-1467332483 Let us mention all the breaking changes and deprecation in both release notes and migration guides -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] gatorsmile commented on pull request #40336: [SPARK-42706][SQL][DOCS] Document the Spark SQL error classes in user-facing documentation.

2023-03-13 Thread via GitHub
gatorsmile commented on PR #40336: URL: https://github.com/apache/spark/pull/40336#issuecomment-1467329403 @MaxGekk should we merge it to 3.4? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pass

2023-03-13 Thread via GitHub
LuciferYang commented on PR #40395: URL: https://github.com/apache/spark/pull/40395#issuecomment-1467321228 friendly ping @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang closed pull request #40404: [SPARK-42777][SQL] Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread via GitHub
gengliangwang closed pull request #40404: [SPARK-42777][SQL] Support converting TimestampNTZ catalog stats to plan stats URL: https://github.com/apache/spark/pull/40404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gengliangwang commented on pull request #40404: [SPARK-42777][SQL] Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread via GitHub
gengliangwang commented on PR #40404: URL: https://github.com/apache/spark/pull/40404#issuecomment-1467317784 merging to master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] otterc commented on pull request #40393: []SPARK-40082]

2023-03-13 Thread via GitHub
otterc commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1467314593 @Stove-hust Thank you for reporting and the patch. Would you be able to share driver logs? -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] sadikovi commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-13 Thread via GitHub
sadikovi commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1134891054 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala: ## @@ -49,10 +49,6 @@ class PostgresIntegrationSuite

[GitHub] [spark] zhengruifeng commented on pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
zhengruifeng commented on PR #40097: URL: https://github.com/apache/spark/pull/40097#issuecomment-1467297939 > For the 2 exceptions: > > > 2, org.apache.spark.ml.linalg.* except VectorUDTSuite due to cyclical dependency; (it copies the VectorUDTSuite except test("JavaTypeInference

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134881273 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] wangyum commented on pull request #40360: [SPARK-42741][SQL] Do not unwrap casts in binary comparison when literal is null

2023-03-13 Thread via GitHub
wangyum commented on PR #40360: URL: https://github.com/apache/spark/pull/40360#issuecomment-1467294557 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] xinrong-meng opened a new pull request, #40405: [WIP][SPARK-42340][CONNECT][PYTHON] Implement `GroupedData.applyInPandas`

2023-03-13 Thread via GitHub
xinrong-meng opened a new pull request, #40405: URL: https://github.com/apache/spark/pull/40405 - [ ] Parity tests ### What changes were proposed in this pull request? Implement `GroupedData.applyInPandas`. ### Why are the changes needed? Parity with vanilla PySpark.

[GitHub] [spark] StevenChenDatabricks commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-13 Thread via GitHub
StevenChenDatabricks commented on PR #40385: URL: https://github.com/apache/spark/pull/40385#issuecomment-1467287615 @cloud-fan Yes this is purely UI and EXPLAIN issue. It does not affect query execution. I'm not sure how AQE context stageCache map would help. The issue in EXPLAIN

[GitHub] [spark] xinrong-meng commented on pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-13 Thread via GitHub
xinrong-meng commented on PR #40388: URL: https://github.com/apache/spark/pull/40388#issuecomment-1467281968 We didn't wrap `pyspark.sql.function.pandas_udf` with `try_remote_functions`, so `"PYSPARK_NO_NAMESPACE_SHARE"` should be irrelevant. -- This is an automated message from the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134836246 ## mllib/core/src/main/scala/org/apache/spark/ml/param/shared/HasExecutionContext.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] zhengruifeng commented on pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng commented on PR #40401: URL: https://github.com/apache/spark/pull/40401#issuecomment-1467252638 thank you @allanf-db , merged into master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134829857 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -191,18 +157,18 @@ private[hive] case class HiveGenericUDF( override protected def

[GitHub] [spark] zhengruifeng closed pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng closed pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect URL: https://github.com/apache/spark/pull/40401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134828589 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +200,56 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134828589 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +200,56 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134827687 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1344,9 +1344,9 @@ def collect(self) -> List[Row]: if self._session is None: raise

[GitHub] [spark] cloud-fan commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-13 Thread via GitHub
cloud-fan commented on PR #40385: URL: https://github.com/apache/spark/pull/40385#issuecomment-1467245909 Yea AQE may remove materialized query stages due to optimizations like empty relation propagation, but I think it's fine as the shuffle files are still there (we don't unregister the

[GitHub] [spark] panbingkun commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
panbingkun commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134810496 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +200,56 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] panbingkun commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
panbingkun commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134807339 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +200,56 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134799357 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1344,9 +1344,9 @@ def collect(self) -> List[Row]: if self._session is None: raise

[GitHub] [spark] gerashegalov commented on a diff in pull request #40372: [SPARK-42752][PYSPARK][SQL] Make PySpark exceptions printable during initialization

2023-03-13 Thread via GitHub
gerashegalov commented on code in PR #40372: URL: https://github.com/apache/spark/pull/40372#discussion_r1134801751 ## python/pyspark/errors/exceptions/captured.py: ## @@ -65,8 +65,15 @@ def __str__(self) -> str: assert SparkContext._jvm is not None jvm =

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134799615 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134799357 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1344,9 +1344,9 @@ def collect(self) -> List[Row]: if self._session is None: raise

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134796113 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1344,9 +1344,9 @@ def collect(self) -> List[Row]: if self._session is None: raise

[GitHub] [spark] cloud-fan commented on a diff in pull request #40142: [SPARK-41171][SQL] Infer and push down window limit through window if partitionSpec is empty

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40142: URL: https://github.com/apache/spark/pull/40142#discussion_r1134795703 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -130,6 +130,7 @@ abstract class Optimizer(catalogManager:

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134793723 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -272,6 +272,9 @@ message ExecutePlanResponse { // The metrics observed during the

[GitHub] [spark] WeichenXu123 commented on pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on PR #40097: URL: https://github.com/apache/spark/pull/40097#issuecomment-1467213725 For the 2 exceptions: > 2, org.apache.spark.ml.linalg.* except VectorUDTSuite due to cyclical dependency; (it copies the VectorUDTSuite except test("JavaTypeInference with

[GitHub] [spark] zhengruifeng commented on pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng commented on PR #40402: URL: https://github.com/apache/spark/pull/40402#issuecomment-1467212310 also cc @WeichenXu123 since this PR supports `df.collect` with UDT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] panbingkun commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
panbingkun commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134790402 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +200,56 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134787824 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134787382 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134776117 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -272,6 +272,9 @@ message ExecutePlanResponse { // The metrics observed during the

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134786529 ## mllib/core/src/main/scala/org/apache/spark/ml/param/shared/HasExecutionContext.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] wangshengjie123 commented on pull request #40391: [SPARK-42766][YARN] YarnAllocator filter excluded nodes when launching containers

2023-03-13 Thread via GitHub
wangshengjie123 commented on PR #40391: URL: https://github.com/apache/spark/pull/40391#issuecomment-1467201970 @Ngone51 @tgravescs could you please help review this pr when you have time, thanks. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134777302 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134776526 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134775534 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] zhengruifeng commented on pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng commented on PR #40401: URL: https://github.com/apache/spark/pull/40401#issuecomment-1467188490 LGTM pending CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan commented on pull request #18990: [SPARK-21782][Core] Repartition creates skews when numPartitions is a power of 2

2023-03-13 Thread via GitHub
cloud-fan commented on PR #18990: URL: https://github.com/apache/spark/pull/18990#issuecomment-1467186918 It should have been fixed in 3.2+: https://github.com/apache/spark/pull/37855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134750091 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -272,6 +272,9 @@ message ExecutePlanResponse { // The metrics observed during the

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134750091 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -272,6 +272,9 @@ message ExecutePlanResponse { // The metrics observed during the

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134749726 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## Review Comment: @grundprinzip The actual Spark data type is necessary to rebuild the UDT

[GitHub] [spark] linhongliu-db commented on pull request #40403: [SPARK-42754][SQL][UI] Fix backward compatibility issue in nested SQL execution

2023-03-13 Thread via GitHub
linhongliu-db commented on PR #40403: URL: https://github.com/apache/spark/pull/40403#issuecomment-1467162461 cc @JoshRosen @rednaxelafx @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] github-actions[bot] closed pull request #38887: [SPARK-41368][SQL] Reorder the window partition expressions by expression stats

2023-03-13 Thread via GitHub
github-actions[bot] closed pull request #38887: [SPARK-41368][SQL] Reorder the window partition expressions by expression stats URL: https://github.com/apache/spark/pull/38887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] wangyum commented on pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-13 Thread via GitHub
wangyum commented on PR #40190: URL: https://github.com/apache/spark/pull/40190#issuecomment-1467139918 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] wangyum closed pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-13 Thread via GitHub
wangyum closed pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type URL: https://github.com/apache/spark/pull/40190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] liang3zy22 closed pull request #40347: [SPARK-42711][BUILD]Update usage info and shellcheck warn/error fix for build/sbt tool

2023-03-13 Thread via GitHub
liang3zy22 closed pull request #40347: [SPARK-42711][BUILD]Update usage info and shellcheck warn/error fix for build/sbt tool URL: https://github.com/apache/spark/pull/40347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] liang3zy22 commented on pull request #40347: [SPARK-42711][BUILD]Update usage info and shellcheck warn/error fix for build/sbt tool

2023-03-13 Thread via GitHub
liang3zy22 commented on PR #40347: URL: https://github.com/apache/spark/pull/40347#issuecomment-1467133558 Yes, this PR is kind of useless. I close it now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] srowen commented on pull request #18990: [SPARK-21782][Core] Repartition creates skews when numPartitions is a power of 2

2023-03-13 Thread via GitHub
srowen commented on PR #18990: URL: https://github.com/apache/spark/pull/18990#issuecomment-1467131067 @atronchi what is "df" here? I couldn't reproduce that with a DF of 200K simple rows -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] ueshin commented on pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-13 Thread via GitHub
ueshin commented on PR #40388: URL: https://github.com/apache/spark/pull/40388#issuecomment-1467126454 Btw, what happens if `"PYSPARK_NO_NAMESPACE_SHARE" in os.environ`?

[GitHub] [spark] itholic commented on pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
itholic commented on PR #40401: URL: https://github.com/apache/spark/pull/40401#issuecomment-1467125777 Seems like there are some more unchanged docstrings in several files as below: ``` spark % git grep "Support Spark Connect" python/pyspark/sql/column.py:Support Spark

[GitHub] [spark] itholic commented on pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
itholic commented on PR #40401: URL: https://github.com/apache/spark/pull/40401#issuecomment-1467125916 Looks good otherwise -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] ueshin commented on pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-13 Thread via GitHub
ueshin commented on PR #40388: URL: https://github.com/apache/spark/pull/40388#issuecomment-1467120067 I guess we can just with the comment: ```py # The implementation of pandas_udf is embedded in pyspark.sql.function.pandas_udf # for code reuse. from pyspark.sql.functions

[GitHub] [spark] atronchi commented on pull request #18990: [SPARK-21782][Core] Repartition creates skews when numPartitions is a power of 2

2023-03-13 Thread via GitHub
atronchi commented on PR #18990: URL: https://github.com/apache/spark/pull/18990#issuecomment-1467104804 This issue appears to remain at large for the dataframe API which is used more broadly than RDD. What would it take to extend the fix to the dataframe API? I verified this on

[GitHub] [spark] gengliangwang opened a new pull request, #40404: [SPARK-42777][SQL] Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread via GitHub
gengliangwang opened a new pull request, #40404: URL: https://github.com/apache/spark/pull/40404 ### What changes were proposed in this pull request? When `spark.sql.cbo.planStats.enabled` or `spark.sql.cbo.enabled` is enabled, the logical plan will fetch row counts and

[GitHub] [spark] mridulm commented on pull request #40393: []SPARK-40082]

2023-03-13 Thread via GitHub
mridulm commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1467025042 +CC @otterc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] mridulm commented on pull request #40262: [SPARK-42651][SQL] Optimize global sort to driver sort

2023-03-13 Thread via GitHub
mridulm commented on PR #40262: URL: https://github.com/apache/spark/pull/40262#issuecomment-1467021744 Meta comment: moving this to the driver has potential for destabilizing the application. In SPARK-36419, we added option to move the final `treeAggregate` to executor given the scale

[GitHub] [spark] linhongliu-db opened a new pull request, #40403: [SPARK-42754][SQL][UI] Fix backward compatibility issue in nested SQL execution

2023-03-13 Thread via GitHub
linhongliu-db opened a new pull request, #40403: URL: https://github.com/apache/spark/pull/40403 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/39268 / [SPARK-41752](https://issues.apache.org/jira/browse/SPARK-41752) added a new non-optional

[GitHub] [spark] ueshin opened a new pull request, #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin opened a new pull request, #40402: URL: https://github.com/apache/spark/pull/40402 ### What changes were proposed in this pull request? Supports `UserDefinedType` in Spark Connect. ### Why are the changes needed? Currently Spark Connect doesn't support UDTs.

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-13 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1134549074 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] olaky commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-13 Thread via GitHub
olaky commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1134514723 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -340,13 +358,26 @@ trait ExposesMetadataColumns extends LogicalPlan {

[GitHub] [spark] olaky commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

2023-03-13 Thread via GitHub
olaky commented on code in PR #40321: URL: https://github.com/apache/spark/pull/40321#discussion_r1134507196 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -281,6 +281,53 @@ class FileMetadataStructSuite extends

[GitHub] [spark] santosh-d3vpl3x commented on pull request #40122: [SPARK-42349][PYTHON] Support pandas cogroup with multiple df

2023-03-13 Thread via GitHub
santosh-d3vpl3x commented on PR #40122: URL: https://github.com/apache/spark/pull/40122#issuecomment-1466761262 @HyukjinKwon may I request for a review on this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] allanf-db opened a new pull request, #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
allanf-db opened a new pull request, #40401: URL: https://github.com/apache/spark/pull/40401 ### What changes were proposed in this pull request? Changing the 3.4.0 version change message for PySpark functionality from "Support Spark Connect" to "Supports Spark Connect".

[GitHub] [spark] dongjoon-hyun commented on pull request #40392: [SPARK-42769][K8S] Add `SPARK_DRIVER_POD_IP` env variable to executor pods

2023-03-13 Thread via GitHub
dongjoon-hyun commented on PR #40392: URL: https://github.com/apache/spark/pull/40392#issuecomment-1466559161 Merged to master for Apache Spark 3.5.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] amaliujia commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-13 Thread via GitHub
amaliujia commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1134333517 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -106,6 +106,37 @@ object functions { case _ =>

[GitHub] [spark] dongjoon-hyun closed pull request #40392: [SPARK-42769][K8S] Add `SPARK_DRIVER_POD_IP` env variable to executor pods

2023-03-13 Thread via GitHub
dongjoon-hyun closed pull request #40392: [SPARK-42769][K8S] Add `SPARK_DRIVER_POD_IP` env variable to executor pods URL: https://github.com/apache/spark/pull/40392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-13 Thread via GitHub
amaliujia commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1134331640 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -106,6 +106,37 @@ object functions { case _ =>

[GitHub] [spark] dongjoon-hyun commented on pull request #40392: [SPARK-42769][K8S] Add `SPARK_DRIVER_POD_IP` env variable to executor pods

2023-03-13 Thread via GitHub
dongjoon-hyun commented on PR #40392: URL: https://github.com/apache/spark/pull/40392#issuecomment-1466547407 Yes, correct, @viirya ! Thank you for the approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40392: [SPARK-42769][K8S] Add `SPARK_DRIVER_POD_IP` env variable to executor pods

2023-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #40392: URL: https://github.com/apache/spark/pull/40392#discussion_r1134323523 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala: ## @@ -54,6 +54,7 @@ private[spark] object Constants { val

[GitHub] [spark] viirya commented on a diff in pull request #40392: [SPARK-42769][K8S] Add `SPARK_DRIVER_POD_IP` env variable to executor pods

2023-03-13 Thread via GitHub
viirya commented on code in PR #40392: URL: https://github.com/apache/spark/pull/40392#discussion_r1134259551 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala: ## @@ -54,6 +54,7 @@ private[spark] object Constants { val

[GitHub] [spark] cloud-fan closed pull request #40399: [SPARK-42101][SQL][FOLLOWUP] Make QueryStageExec more type safe

2023-03-13 Thread via GitHub
cloud-fan closed pull request #40399: [SPARK-42101][SQL][FOLLOWUP] Make QueryStageExec more type safe URL: https://github.com/apache/spark/pull/40399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on pull request #40399: [SPARK-42101][SQL][FOLLOWUP] Make QueryStageExec more type safe

2023-03-13 Thread via GitHub
cloud-fan commented on PR #40399: URL: https://github.com/apache/spark/pull/40399#issuecomment-1466424185 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-13 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1466394445 > That's a "no" from me, per the logic above Thanks @srowen But seems I am not able to explain the change to you. So it's better to get review from someone who is qualified to

[GitHub] [spark] ClownXC opened a new pull request, #40400: [SPARK-41359][SQL] Use `PhysicalDataType` instead of DataType in UnsafeRow

2023-03-13 Thread via GitHub
ClownXC opened a new pull request, #40400: URL: https://github.com/apache/spark/pull/40400 What changes were proposed in this pull request? The main change of this pr is refactor UnsafeRow#isMutable and UnsafeRow#isFixedLength method to use PhysicalDataType instead of DataType.

[GitHub] [spark] dongjoon-hyun commented on pull request #40392: [SPARK-42769][K8S] Add `SPARK_DRIVER_POD_IP` env variable to executor pods

2023-03-13 Thread via GitHub
dongjoon-hyun commented on PR #40392: URL: https://github.com/apache/spark/pull/40392#issuecomment-1466385779 Hi, @viirya . Could you review this PR when you have some time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] johanl-db commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-13 Thread via GitHub
johanl-db commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1134165086 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlignUpdateAssignmentsSuite.scala: ## @@ -0,0 +1,786 @@ +/* + * Licensed to the Apache Software

  1   2   >