[GitHub] [spark] zhengruifeng commented on pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng commented on PR #40402: URL: https://github.com/apache/spark/pull/40402#issuecomment-1467212310 also cc @WeichenXu123 since this PR supports `df.collect` with UDT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] panbingkun commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
panbingkun commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134807339 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +200,56 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134836246 ## mllib/core/src/main/scala/org/apache/spark/ml/param/shared/HasExecutionContext.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] sadikovi commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-13 Thread via GitHub
sadikovi commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1134891054 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala: ## @@ -49,10 +49,6 @@ class PostgresIntegrationSuite

[GitHub] [spark] itholic commented on pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-13 Thread via GitHub
itholic commented on PR #40282: URL: https://github.com/apache/spark/pull/40282#issuecomment-1467350642 Reminder for @HyukjinKwon @srielau @MaxGekk for error class document for PySpark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pa

2023-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #40395: URL: https://github.com/apache/spark/pull/40395#discussion_r1134938946 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala: ## @@ -130,9 +131,15 @@ class SQLImplicitsTestSuite extends

[GitHub] [spark] LuciferYang commented on a diff in pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pass

2023-03-13 Thread via GitHub
LuciferYang commented on code in PR #40395: URL: https://github.com/apache/spark/pull/40395#discussion_r1135005783 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala: ## @@ -130,9 +132,21 @@ class SQLImplicitsTestSuite extends

[GitHub] [spark] cloud-fan commented on pull request #18990: [SPARK-21782][Core] Repartition creates skews when numPartitions is a power of 2

2023-03-13 Thread via GitHub
cloud-fan commented on PR #18990: URL: https://github.com/apache/spark/pull/18990#issuecomment-1467186918 It should have been fixed in 3.2+: https://github.com/apache/spark/pull/37855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] wangshengjie123 commented on pull request #40391: [SPARK-42766][YARN] YarnAllocator filter excluded nodes when launching containers

2023-03-13 Thread via GitHub
wangshengjie123 commented on PR #40391: URL: https://github.com/apache/spark/pull/40391#issuecomment-1467201970 @Ngone51 @tgravescs could you please help review this pr when you have time, thanks. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] panbingkun commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
panbingkun commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134810496 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +200,56 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pa

2023-03-13 Thread via GitHub
dongjoon-hyun commented on code in PR #40395: URL: https://github.com/apache/spark/pull/40395#discussion_r1134941041 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala: ## @@ -130,9 +131,15 @@ class SQLImplicitsTestSuite extends

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134960384 ## mllib/core/src/test/scala/org/apache/spark/ml/attribute/AttributeGroupSuite.scala: ## @@ -1,65 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134959146 ## mllib/core/src/test/scala/org/apache/spark/ml/attribute/AttributeGroupSuite.scala: ## @@ -1,65 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] ueshin commented on pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-13 Thread via GitHub
ueshin commented on PR #40388: URL: https://github.com/apache/spark/pull/40388#issuecomment-1467126454 Btw, what happens if `"PYSPARK_NO_NAMESPACE_SHARE" in os.environ`?

[GitHub] [spark] liang3zy22 closed pull request #40347: [SPARK-42711][BUILD]Update usage info and shellcheck warn/error fix for build/sbt tool

2023-03-13 Thread via GitHub
liang3zy22 closed pull request #40347: [SPARK-42711][BUILD]Update usage info and shellcheck warn/error fix for build/sbt tool URL: https://github.com/apache/spark/pull/40347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] liang3zy22 commented on pull request #40347: [SPARK-42711][BUILD]Update usage info and shellcheck warn/error fix for build/sbt tool

2023-03-13 Thread via GitHub
liang3zy22 commented on PR #40347: URL: https://github.com/apache/spark/pull/40347#issuecomment-1467133558 Yes, this PR is kind of useless. I close it now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134787382 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134829857 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -191,18 +157,18 @@ private[hive] case class HiveGenericUDF( override protected def

[GitHub] [spark] zhengruifeng closed pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng closed pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect URL: https://github.com/apache/spark/pull/40401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhengruifeng commented on pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng commented on PR #40401: URL: https://github.com/apache/spark/pull/40401#issuecomment-1467252638 thank you @allanf-db , merged into master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] atronchi commented on pull request #18990: [SPARK-21782][Core] Repartition creates skews when numPartitions is a power of 2

2023-03-13 Thread via GitHub
atronchi commented on PR #18990: URL: https://github.com/apache/spark/pull/18990#issuecomment-1467104804 This issue appears to remain at large for the dataframe API which is used more broadly than RDD. What would it take to extend the fix to the dataframe API? I verified this on

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134750091 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -272,6 +272,9 @@ message ExecutePlanResponse { // The metrics observed during the

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134750091 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -272,6 +272,9 @@ message ExecutePlanResponse { // The metrics observed during the

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134776526 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134777302 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] WeichenXu123 commented on pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on PR #40097: URL: https://github.com/apache/spark/pull/40097#issuecomment-1467213725 For the 2 exceptions: > 2, org.apache.spark.ml.linalg.* except VectorUDTSuite due to cyclical dependency; (it copies the VectorUDTSuite except test("JavaTypeInference with

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134793723 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -272,6 +272,9 @@ message ExecutePlanResponse { // The metrics observed during the

[GitHub] [spark] cloud-fan commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-13 Thread via GitHub
cloud-fan commented on PR #40385: URL: https://github.com/apache/spark/pull/40385#issuecomment-1467245909 Yea AQE may remove materialized query stages due to optimizations like empty relation propagation, but I think it's fine as the shuffle files are still there (we don't unregister the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134827687 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1344,9 +1344,9 @@ def collect(self) -> List[Row]: if self._session is None: raise

[GitHub] [spark] zhengruifeng commented on pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
zhengruifeng commented on PR #40097: URL: https://github.com/apache/spark/pull/40097#issuecomment-1467297939 > For the 2 exceptions: > > > 2, org.apache.spark.ml.linalg.* except VectorUDTSuite due to cyclical dependency; (it copies the VectorUDTSuite except test("JavaTypeInference

[GitHub] [spark] gatorsmile commented on pull request #40216: [SPARK-42593][PS] Deprecate & remove the APIs that will be removed in pandas 2.0.

2023-03-13 Thread via GitHub
gatorsmile commented on PR #40216: URL: https://github.com/apache/spark/pull/40216#issuecomment-1467332483 Let us mention all the breaking changes and deprecation in both release notes and migration guides -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] panbingkun commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
panbingkun commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134790402 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +200,56 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134799615 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134799357 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1344,9 +1344,9 @@ def collect(self) -> List[Row]: if self._session is None: raise

[GitHub] [spark] StevenChenDatabricks commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-13 Thread via GitHub
StevenChenDatabricks commented on PR #40385: URL: https://github.com/apache/spark/pull/40385#issuecomment-1467287615 @cloud-fan Yes this is purely UI and EXPLAIN issue. It does not affect query execution. I'm not sure how AQE context stageCache map would help. The issue in EXPLAIN

[GitHub] [spark] github-actions[bot] closed pull request #38887: [SPARK-41368][SQL] Reorder the window partition expressions by expression stats

2023-03-13 Thread via GitHub
github-actions[bot] closed pull request #38887: [SPARK-41368][SQL] Reorder the window partition expressions by expression stats URL: https://github.com/apache/spark/pull/38887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134775534 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] xinrong-meng opened a new pull request, #40405: [WIP][SPARK-42340][CONNECT][PYTHON] Implement `GroupedData.applyInPandas`

2023-03-13 Thread via GitHub
xinrong-meng opened a new pull request, #40405: URL: https://github.com/apache/spark/pull/40405 - [ ] Parity tests ### What changes were proposed in this pull request? Implement `GroupedData.applyInPandas`. ### Why are the changes needed? Parity with vanilla PySpark.

[GitHub] [spark] ulysses-you opened a new pull request, #40406: [SPARK-42101][SQL][FOLLOWUP] Improve TableCacheQueryStage with CoalesceShufflePartitions

2023-03-13 Thread via GitHub
ulysses-you opened a new pull request, #40406: URL: https://github.com/apache/spark/pull/40406 ### What changes were proposed in this pull request? `CoalesceShufflePartitions` should make sure all leaves are `ExchangeQueryStageExec` to avoid collect `TableCacheQueryStage`. As

[GitHub] [spark] pan3793 commented on pull request #39160: [SPARK-41667][K8S] Expose env var SPARK_DRIVER_POD_NAME in Driver Pod

2023-03-13 Thread via GitHub
pan3793 commented on PR #39160: URL: https://github.com/apache/spark/pull/39160#issuecomment-1467341858 I found that [apple/batch-processing-gateway](https://github.com/apple/batch-processing-gateway) uses Pod Name to fetch the log as well

[GitHub] [spark] LuciferYang commented on a diff in pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pass

2023-03-13 Thread via GitHub
LuciferYang commented on code in PR #40395: URL: https://github.com/apache/spark/pull/40395#discussion_r1134966350 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala: ## @@ -130,9 +132,21 @@ class SQLImplicitsTestSuite extends

[GitHub] [spark] itholic commented on pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
itholic commented on PR #40401: URL: https://github.com/apache/spark/pull/40401#issuecomment-1467125916 Looks good otherwise -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] itholic commented on pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
itholic commented on PR #40401: URL: https://github.com/apache/spark/pull/40401#issuecomment-1467125777 Seems like there are some more unchanged docstrings in several files as below: ``` spark % git grep "Support Spark Connect" python/pyspark/sql/column.py:Support Spark

[GitHub] [spark] wangyum closed pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-13 Thread via GitHub
wangyum closed pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type URL: https://github.com/apache/spark/pull/40190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum commented on pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-13 Thread via GitHub
wangyum commented on PR #40190: URL: https://github.com/apache/spark/pull/40190#issuecomment-1467139918 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134749726 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## Review Comment: @grundprinzip The actual Spark data type is necessary to rebuild the UDT

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134796113 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1344,9 +1344,9 @@ def collect(self) -> List[Row]: if self._session is None: raise

[GitHub] [spark] cloud-fan commented on a diff in pull request #40142: [SPARK-41171][SQL] Infer and push down window limit through window if partitionSpec is empty

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40142: URL: https://github.com/apache/spark/pull/40142#discussion_r1134795703 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -130,6 +130,7 @@ abstract class Optimizer(catalogManager:

[GitHub] [spark] ueshin commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
ueshin commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134799357 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1344,9 +1344,9 @@ def collect(self) -> List[Row]: if self._session is None: raise

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134881273 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on pull request #40404: [SPARK-42777][SQL] Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread via GitHub
gengliangwang commented on PR #40404: URL: https://github.com/apache/spark/pull/40404#issuecomment-1467317784 merging to master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang closed pull request #40404: [SPARK-42777][SQL] Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread via GitHub
gengliangwang closed pull request #40404: [SPARK-42777][SQL] Support converting TimestampNTZ catalog stats to plan stats URL: https://github.com/apache/spark/pull/40404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gatorsmile commented on pull request #40336: [SPARK-42706][SQL][DOCS] Document the Spark SQL error classes in user-facing documentation.

2023-03-13 Thread via GitHub
gatorsmile commented on PR #40336: URL: https://github.com/apache/spark/pull/40336#issuecomment-1467329403 @MaxGekk should we merge it to 3.4? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134945494 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134961956 ## mllib/core/src/test/scala/org/apache/spark/ml/attribute/AttributeGroupSuite.scala: ## @@ -1,65 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] srowen commented on pull request #18990: [SPARK-21782][Core] Repartition creates skews when numPartitions is a power of 2

2023-03-13 Thread via GitHub
srowen commented on PR #18990: URL: https://github.com/apache/spark/pull/18990#issuecomment-1467131067 @atronchi what is "df" here? I couldn't reproduce that with a DF of 200K simple rows -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] linhongliu-db commented on pull request #40403: [SPARK-42754][SQL][UI] Fix backward compatibility issue in nested SQL execution

2023-03-13 Thread via GitHub
linhongliu-db commented on PR #40403: URL: https://github.com/apache/spark/pull/40403#issuecomment-1467162461 cc @JoshRosen @rednaxelafx @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on pull request #40401: [SPARK-42773][DOCS][PYTHON] Minor update to 3.4.0 version change message for Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng commented on PR #40401: URL: https://github.com/apache/spark/pull/40401#issuecomment-1467188490 LGTM pending CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134786529 ## mllib/core/src/main/scala/org/apache/spark/ml/param/shared/HasExecutionContext.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40402: URL: https://github.com/apache/spark/pull/40402#discussion_r1134776117 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -272,6 +272,9 @@ message ExecutePlanResponse { // The metrics observed during the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134828589 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +200,56 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134828589 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +200,56 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] LuciferYang commented on pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pass

2023-03-13 Thread via GitHub
LuciferYang commented on PR #40395: URL: https://github.com/apache/spark/pull/40395#issuecomment-1467321228 friendly ping @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Stove-hust commented on pull request #40393: []SPARK-40082]

2023-03-13 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1467339828 > @Stove-hust Thank you for reporting and the patch. Would you be able to share driver logs? ** --- stage 10 faield 22/10/15 10:55:58 WARN task-result-getter-1

[GitHub] [spark] Stove-hust commented on pull request #40393: []SPARK-40082]

2023-03-13 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1467340408 > @Stove-hust Thank you for reporting and the patch. Would you be able to share driver logs? Sure(Add some comments) --- stage 10 faield 22/10/15 10:55:58 WARN

[GitHub] [spark] Stove-hust commented on pull request #40393: []SPARK-40082]

2023-03-13 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1467339346 > @Stove-hust Thank you for reporting and the patch. Would you be able to share driver logs? sure. `# stage 10 faield 22/10/15 10:55:58 WARN task-result-getter-1

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134994989 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -129,58 +129,25 @@ private[hive] class DeferredObjectAdapter(oi: ObjectInspector,

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-13 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1134995335 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -129,58 +129,25 @@ private[hive] class DeferredObjectAdapter(oi: ObjectInspector,

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134995656 ## mllib/core/src/main/scala/org/apache/spark/ml/param/shared/HasExecutionContext.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] ueshin commented on pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-13 Thread via GitHub
ueshin commented on PR #40388: URL: https://github.com/apache/spark/pull/40388#issuecomment-1467120067 I guess we can just with the comment: ```py # The implementation of pandas_udf is embedded in pyspark.sql.function.pandas_udf # for code reuse. from pyspark.sql.functions

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134787824 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Estimator.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] gerashegalov commented on a diff in pull request #40372: [SPARK-42752][PYSPARK][SQL] Make PySpark exceptions printable during initialization

2023-03-13 Thread via GitHub
gerashegalov commented on code in PR #40372: URL: https://github.com/apache/spark/pull/40372#discussion_r1134801751 ## python/pyspark/errors/exceptions/captured.py: ## @@ -65,8 +65,15 @@ def __str__(self) -> str: assert SparkContext._jvm is not None jvm =

[GitHub] [spark] xinrong-meng commented on pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-13 Thread via GitHub
xinrong-meng commented on PR #40388: URL: https://github.com/apache/spark/pull/40388#issuecomment-1467281968 We didn't wrap `pyspark.sql.function.pandas_udf` with `try_remote_functions`, so `"PYSPARK_NO_NAMESPACE_SHARE"` should be irrelevant. -- This is an automated message from the

[GitHub] [spark] wangyum commented on pull request #40360: [SPARK-42741][SQL] Do not unwrap casts in binary comparison when literal is null

2023-03-13 Thread via GitHub
wangyum commented on PR #40360: URL: https://github.com/apache/spark/pull/40360#issuecomment-1467294557 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] otterc commented on pull request #40393: []SPARK-40082]

2023-03-13 Thread via GitHub
otterc commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1467314593 @Stove-hust Thank you for reporting and the patch. Would you be able to share driver logs? -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] ueshin commented on pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-13 Thread via GitHub
ueshin commented on PR #40388: URL: https://github.com/apache/spark/pull/40388#issuecomment-1467349620 It's irrelevant means it's an issue, no? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1134946826 ## mllib/core/src/main/scala/org/apache/spark/ml/param/shared/HasExecutionContext.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] panbingkun commented on a diff in pull request #39949: [SPARK-42386][SQL] Rewrite HiveGenericUDF with Invoke

2023-03-13 Thread via GitHub
panbingkun commented on code in PR #39949: URL: https://github.com/apache/spark/pull/39949#discussion_r1133604830 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -194,47 +183,52 @@ private[hive] case class HiveGenericUDF( override protected def

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1133666221 ## mllib/src/main/scala/org/apache/spark/ml/param/params.scala: ## @@ -44,8 +45,14 @@ import org.apache.spark.ml.util.Identifiable *See

[GitHub] [spark] shrprasa commented on a diff in pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-13 Thread via GitHub
shrprasa commented on code in PR #40128: URL: https://github.com/apache/spark/pull/40128#discussion_r1133671423 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala: ## @@ -143,6 +144,9 @@ private[spark] class

[GitHub] [spark] beliefer closed pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-03-13 Thread via GitHub
beliefer closed pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query. URL: https://github.com/apache/spark/pull/39990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] beliefer commented on pull request #39990: [SPARK-42415][SQL] The built-in dialects support OFFSET and paging query.

2023-03-13 Thread via GitHub
beliefer commented on PR #39990: URL: https://github.com/apache/spark/pull/39990#issuecomment-1465922089 https://github.com/apache/spark/pull/40396 used to replace this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] beliefer commented on pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-13 Thread via GitHub
beliefer commented on PR #40396: URL: https://github.com/apache/spark/pull/40396#issuecomment-1465922700 ping @huaxingao cc @cloud-fan @sadikovi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1133771185 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] pan3793 commented on pull request #39160: [SPARK-41667][K8S] Expose env var SPARK_DRIVER_POD_NAME in Driver Pod

2023-03-13 Thread via GitHub
pan3793 commented on PR #39160: URL: https://github.com/apache/spark/pull/39160#issuecomment-1465973424 cross-refer comments from https://github.com/apache/spark/pull/40392#issuecomment-1465870752 > Your PR tried to add `SPARK_DRIVER_POD_NAME` to Driver Pod to expose it to 3rd party

[GitHub] [spark] jdferreira opened a new pull request, #40398: Update `translate` docblock

2023-03-13 Thread via GitHub
jdferreira opened a new pull request, #40398: URL: https://github.com/apache/spark/pull/40398 ### What changes were proposed in this pull request? The documentation for the `translate` SQL function is a bit difficult to parse and understand. I propose the new texting. ### Why

[GitHub] [spark] LuciferYang opened a new pull request, #40395: [SPARK-42770] WIP

2023-03-13 Thread via GitHub
LuciferYang opened a new pull request, #40395: URL: https://github.com/apache/spark/pull/40395 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] mridulm commented on pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-13 Thread via GitHub
mridulm commented on PR #40286: URL: https://github.com/apache/spark/pull/40286#issuecomment-1465633950 Merged to master. Thanks for fixing this @ivoson ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] mridulm closed pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-13 Thread via GitHub
mridulm closed pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry URL: https://github.com/apache/spark/pull/40286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] wangshengjie123 commented on pull request #40391: [WIP][SPARK-42766][YARN] YarnAllocator filter excluded nodes when launching containers

2023-03-13 Thread via GitHub
wangshengjie123 commented on PR #40391: URL: https://github.com/apache/spark/pull/40391#issuecomment-1465682073 I am not sure if we should add a Executor exit code and optimize the RegisterExecutor response message in this pr.In production environment, we found sometimes only filter the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-13 Thread via GitHub
zhengruifeng commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1133568797 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2065,6 +2065,44 @@ class PlanGenerationTestSuite

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-13 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1133666221 ## mllib/src/main/scala/org/apache/spark/ml/param/params.scala: ## @@ -44,8 +45,14 @@ import org.apache.spark.ml.util.Identifiable *See

[GitHub] [spark] pan3793 commented on pull request #40392: [SPARK-42769][K8S] Add `SPARK_DRIVER_POD_IP` env variable to executor pods

2023-03-13 Thread via GitHub
pan3793 commented on PR #40392: URL: https://github.com/apache/spark/pull/40392#issuecomment-1465812124 > ... for some executor pods to connect driver pods via IP. Hi @dongjoon-hyun, I think it's quite useful, but in

[GitHub] [spark] LuciferYang commented on pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` test pass on Linux

2023-03-13 Thread via GitHub
LuciferYang commented on PR #40395: URL: https://github.com/apache/spark/pull/40395#issuecomment-1465860908 https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682 ``` [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 milliseconds)

[GitHub] [spark] dongjoon-hyun commented on pull request #40392: [SPARK-42769][K8S] Add `SPARK_DRIVER_POD_IP` env variable to executor pods

2023-03-13 Thread via GitHub
dongjoon-hyun commented on PR #40392: URL: https://github.com/apache/spark/pull/40392#issuecomment-1465870752 @pan3793 . The goal of PR is different from your PR's goal. - Your PR tried to add `SPARK_DRIVER_POD_NAME` to `Driver Pod` to expose it to **3rd party pods**. - This PR

[GitHub] [spark] LuciferYang commented on pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pass

2023-03-13 Thread via GitHub
LuciferYang commented on PR #40395: URL: https://github.com/apache/spark/pull/40395#issuecomment-1465912943 cc @HyukjinKwon also cc @bjornjorgensen who reported this issue in dev mail list -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] beliefer opened a new pull request, #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-13 Thread via GitHub
beliefer opened a new pull request, #40396: URL: https://github.com/apache/spark/pull/40396 ### What changes were proposed in this pull request? Currently, DS V2 pushdown could let JDBC dialect decide to push down `OFFSET`, `LIMIT` and table sample. Because some databases doesn't support

[GitHub] [spark] panbingkun opened a new pull request, #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-13 Thread via GitHub
panbingkun opened a new pull request, #40397: URL: https://github.com/apache/spark/pull/40397 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch

[GitHub] [spark] panbingkun commented on a diff in pull request #39949: [SPARK-42386][SQL] Rewrite HiveGenericUDF with Invoke

2023-03-13 Thread via GitHub
panbingkun commented on code in PR #39949: URL: https://github.com/apache/spark/pull/39949#discussion_r1133823856 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -194,47 +183,52 @@ private[hive] case class HiveGenericUDF( override protected def

[GitHub] [spark] EnricoMi commented on pull request #39952: [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas for schema mismatch

2023-03-13 Thread via GitHub
EnricoMi commented on PR #39952: URL: https://github.com/apache/spark/pull/39952#issuecomment-1465739961 CC @cloud-fan @itholic @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

  1   2   >