[GitHub] [spark] pan3793 commented on pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-20 Thread via GitHub
pan3793 commented on PR #40444: URL: https://github.com/apache/spark/pull/40444#issuecomment-1477316443 @dongjoon-hyun thank you, I updated the log message to ``` "Application $appName with application ID $appId and submission ID $sId finished" ``` -- This is an automated

[GitHub] [spark] yliou opened a new pull request, #40502: [SPARK-42829] [UI] add repeat identifier to cached RDD on stage page

2023-03-20 Thread via GitHub
yliou opened a new pull request, #40502: URL: https://github.com/apache/spark/pull/40502 ### What changes were proposed in this pull request? Adds Repeat Identifier: to the cached RDD node on the Stages page. Made the Repeat Identifier: have bolded text so that it's easier to

[GitHub] [spark] zhengruifeng closed pull request #40501: [SPARK-42864][ML] Make IsotonicRegression.PointsAccumulator private

2023-03-20 Thread via GitHub
zhengruifeng closed pull request #40501: [SPARK-42864][ML] Make IsotonicRegression.PointsAccumulator private URL: https://github.com/apache/spark/pull/40501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] xinrong-meng commented on pull request #40500: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub
xinrong-meng commented on PR #40500: URL: https://github.com/apache/spark/pull/40500#issuecomment-1477308745 Merged to branch-3.4, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] xinrong-meng closed pull request #40500: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub
xinrong-meng closed pull request #40500: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private URL: https://github.com/apache/spark/pull/40500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #40500: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub
zhengruifeng commented on PR #40500: URL: https://github.com/apache/spark/pull/40500#issuecomment-1477304948 @xinrong-meng it seems this PR was already merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-20 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1142906959 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -125,13 +128,27 @@ class EquivalentExpressions { }

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-20 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1142906959 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -125,13 +128,27 @@ class EquivalentExpressions { }

[GitHub] [spark] sudoliyang commented on pull request #40494: [MINOR][DOCS] Fix typos

2023-03-20 Thread via GitHub
sudoliyang commented on PR #40494: URL: https://github.com/apache/spark/pull/40494#issuecomment-1477300074 I did enable action on my forked repo and rebase to `apache/spark` master. Can anyone re-run the workflows? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] rednaxelafx commented on a diff in pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

2023-03-20 Thread via GitHub
rednaxelafx commented on code in PR #40488: URL: https://github.com/apache/spark/pull/40488#discussion_r1142905889 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -296,12 +298,17 @@ object PhysicalAggregation { // build a set

[GitHub] [spark] pan3793 commented on pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-20 Thread via GitHub
pan3793 commented on PR #40444: URL: https://github.com/apache/spark/pull/40444#issuecomment-1477299823 Kindly ping @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] huaxingao commented on a diff in pull request #40495: test for reading footer within file range

2023-03-20 Thread via GitHub
huaxingao commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1142903084 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] huaxingao commented on a diff in pull request #40495: test for reading footer within file range

2023-03-20 Thread via GitHub
huaxingao commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1142902856 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-20 Thread via GitHub
dtenedor commented on code in PR #40496: URL: https://github.com/apache/spark/pull/40496#discussion_r1142891754 ## sql/core/src/test/resources/sql-tests/analyzer-results/ansi/cast.sql.out: ## @@ -0,0 +1,881 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +SELECT

[GitHub] [spark] LuciferYang commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-20 Thread via GitHub
LuciferYang commented on code in PR #40496: URL: https://github.com/apache/spark/pull/40496#discussion_r1142891030 ## sql/core/src/test/resources/sql-tests/analyzer-results/ansi/cast.sql.out: ## @@ -0,0 +1,881 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query

[GitHub] [spark] zhengruifeng opened a new pull request, #40501: [SPARK-42864][ML] Make IsotonicRegression.PointsAccumulator private

2023-03-20 Thread via GitHub
zhengruifeng opened a new pull request, #40501: URL: https://github.com/apache/spark/pull/40501 ### What changes were proposed in this pull request? Make `IsotonicRegression.PointsAccumulator` private, which was introduced in

[GitHub] [spark] xinrong-meng commented on pull request #40500: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub
xinrong-meng commented on PR #40500: URL: https://github.com/apache/spark/pull/40500#issuecomment-1477276287 LGTM, thank you @zhengruifeng ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on pull request #40485: [SPARK-42870][CONNECT] Move `toCatalystValue` to `connect-common`

2023-03-20 Thread via GitHub
amaliujia commented on PR #40485: URL: https://github.com/apache/spark/pull/40485#issuecomment-1477271752 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] ueshin commented on pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-20 Thread via GitHub
ueshin commented on PR #40402: URL: https://github.com/apache/spark/pull/40402#issuecomment-1477260665 @zhengruifeng ah, seems like something is wrong when the schema is a column name list. Could you use `StructType` to specify the schema as a workaround? I'll take a look later. --

[GitHub] [spark] srowen commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-20 Thread via GitHub
srowen commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1477246959 I don't know enough to say whether it's worth a new method. Can we start with the change that needs no new API, is it a big enough win? -- This is an automated message from the Apache

[GitHub] [spark] mskapilks commented on pull request #40266: [SPARK-42660][SQL] Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)

2023-03-20 Thread via GitHub
mskapilks commented on PR #40266: URL: https://github.com/apache/spark/pull/40266#issuecomment-1477245713 > Looks like there are a few failures after moving the rule ([22e7886](https://github.com/apache/spark/commit/22e7886ff1059b98d1525380b2cb22718fd5dd09)). @mskapilks, do you think you

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40500: [WIP][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub
zhengruifeng commented on code in PR #40500: URL: https://github.com/apache/spark/pull/40500#discussion_r1142863924 ## mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala: ## @@ -490,15 +490,13 @@ class IsotonicRegression private (private var

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40500: [WIP][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub
WeichenXu123 commented on code in PR #40500: URL: https://github.com/apache/spark/pull/40500#discussion_r1142863633 ## mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala: ## @@ -490,15 +490,13 @@ class IsotonicRegression private (private var

[GitHub] [spark] itholic closed pull request #40270: [WIP][SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-20 Thread via GitHub
itholic closed pull request #40270: [WIP][SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function. URL: https://github.com/apache/spark/pull/40270 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] itholic commented on pull request #40270: [WIP][SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-20 Thread via GitHub
itholic commented on PR #40270: URL: https://github.com/apache/spark/pull/40270#issuecomment-1477238760 As the #40456 has been completed, will resume this one. Let me close this PR for convenience and open a new one. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] lyy-pineapple commented on a diff in pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2023-03-20 Thread via GitHub
lyy-pineapple commented on code in PR #38171: URL: https://github.com/apache/spark/pull/38171#discussion_r1142860063 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressionsJoni.scala: ## @@ -0,0 +1,471 @@ +/* + * Licensed to the Apache

[GitHub] [spark] zhengruifeng opened a new pull request, #40500: [WIP][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub
zhengruifeng opened a new pull request, #40500: URL: https://github.com/apache/spark/pull/40500 ### What changes were proposed in this pull request? Make `IsotonicRegression.PointsAccumulator` private ### Why are the changes needed? `PointsAccumulator` is implementation

[GitHub] [spark] Stove-hust commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub
Stove-hust commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1142853303 ## pom.xml: ## @@ -114,7 +114,7 @@ 1.8 ${java.version} ${java.version} -3.8.7 +3.6.3 Review Comment:  Forgot to change it back after

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-20 Thread via GitHub
yabola commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1142848071 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala: ## @@ -205,11 +212,21 @@ class ParquetFileFormat val

[GitHub] [spark] amaliujia commented on a diff in pull request #40499: [SPARK-42876][SQL] DataType's physicalDataType should be private[sql]

2023-03-20 Thread via GitHub
amaliujia commented on code in PR #40499: URL: https://github.com/apache/spark/pull/40499#discussion_r1142847660 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -119,7 +119,7 @@ abstract class DataType extends AbstractDataType { override

[GitHub] [spark] zhengruifeng commented on pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-20 Thread via GitHub
zhengruifeng commented on PR #40402: URL: https://github.com/apache/spark/pull/40402#issuecomment-1477209035 I save a df with UDT in pyspark, and then read it in python client, and it works fine. So I guess something is wrong in `createDataFrame` vanilla PySpark: ``` In

[GitHub] [spark] yabola commented on a diff in pull request #40495: test for reading footer within file range

2023-03-20 Thread via GitHub
yabola commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1142841503 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] zhengruifeng commented on pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-20 Thread via GitHub
zhengruifeng commented on PR #40402: URL: https://github.com/apache/spark/pull/40402#issuecomment-1477202593 @ueshin it seem that `createDataFrame` always use the underlying `sqlType` other than the UDT itself: ``` In [1]: from pyspark.ml.linalg import Vectors In [2]: df =

[GitHub] [spark] yabola commented on a diff in pull request #40495: test for reading footer within file range

2023-03-20 Thread via GitHub
yabola commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1142841503 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] LuciferYang commented on a diff in pull request #40408: [SPARK-42780][BUILD] Upgrade `Tink` to 1.8.0

2023-03-20 Thread via GitHub
LuciferYang commented on code in PR #40408: URL: https://github.com/apache/spark/pull/40408#discussion_r1142839820 ## pom.xml: ## @@ -214,7 +214,7 @@ 1.1.0 1.5.0 1.60 -1.7.0 +1.8.0 Review Comment: @bjornjorgensen We can exclude

[GitHub] [spark] cloud-fan commented on a diff in pull request #40499: [SPARK-42876][SQL] DataType's physicalDataType should be private[sql]

2023-03-20 Thread via GitHub
cloud-fan commented on code in PR #40499: URL: https://github.com/apache/spark/pull/40499#discussion_r1142839025 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -119,7 +119,7 @@ abstract class DataType extends AbstractDataType { override

[GitHub] [spark] cloud-fan commented on a diff in pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

2023-03-20 Thread via GitHub
cloud-fan commented on code in PR #40488: URL: https://github.com/apache/spark/pull/40488#discussion_r1142837488 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -296,12 +298,17 @@ object PhysicalAggregation { // build a set of

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-20 Thread via GitHub
zhengruifeng commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1477193966 TL;DR I want to apply scalar subquery to optimize `FPGrowthModel.transform`, there are two options: 1, create temp views and use `spark.sql`, see

[GitHub] [spark] LuciferYang commented on pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend`

2023-03-20 Thread via GitHub
LuciferYang commented on PR #40492: URL: https://github.com/apache/spark/pull/40492#issuecomment-1477180380 Thanks All ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] beliefer commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub
beliefer commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1142827066 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -163,6 +164,14 @@ message AnalyzePlanRequest { // (Required) The logical plan to

[GitHub] [spark] beliefer commented on pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub
beliefer commented on PR #40467: URL: https://github.com/apache/spark/pull/40467#issuecomment-1477176898 > Do we need python side compatible API? Maybe. I will check the python side. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] beliefer commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub
beliefer commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1142826734 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -1211,13 +1211,11 @@ class Column private[sql] (private[sql] val expr:

[GitHub] [spark] beliefer commented on a diff in pull request #40466: [SPARK-42835][SQL][TESTS] Add test cases for `Column.explain`

2023-03-20 Thread via GitHub
beliefer commented on code in PR #40466: URL: https://github.com/apache/spark/pull/40466#discussion_r1142825198 ## sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala: ## @@ -921,6 +922,132 @@ class ColumnExpressionSuite extends QueryTest with

[GitHub] [spark] yaooqinn commented on a diff in pull request #40476: [MINOR][BUILD] Remove unused properties in pom file

2023-03-20 Thread via GitHub
yaooqinn commented on code in PR #40476: URL: https://github.com/apache/spark/pull/40476#discussion_r1142823591 ## resource-managers/kubernetes/integration-tests/pom.xml: ## @@ -26,8 +26,6 @@ spark-kubernetes-integration-tests_2.12 -1.3.0 - Review Comment:

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40487: [WIP] Implement CoGrouped Map API

2023-03-20 Thread via GitHub
xinrong-meng commented on code in PR #40487: URL: https://github.com/apache/spark/pull/40487#discussion_r1142823170 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -509,6 +511,26 @@ class SparkConnectPlanner(val

[GitHub] [spark] beliefer commented on pull request #40418: [SPARK-42790][SQL] Abstract the excluded method for better test for JDBC docker tests.

2023-03-20 Thread via GitHub
beliefer commented on PR #40418: URL: https://github.com/apache/spark/pull/40418#issuecomment-1477169805 @srowen Thank you! @cloud-fan @huaxingao Thank you too! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #40497: [SPARK-42875][CONNECT][PYTHON] Fix toPandas to handle timezone and map types properly

2023-03-20 Thread via GitHub
zhengruifeng commented on PR #40497: URL: https://github.com/apache/spark/pull/40497#issuecomment-1477168751 merged to master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #40497: [SPARK-42875][CONNECT][PYTHON] Fix toPandas to handle timezone and map types properly

2023-03-20 Thread via GitHub
zhengruifeng closed pull request #40497: [SPARK-42875][CONNECT][PYTHON] Fix toPandas to handle timezone and map types properly URL: https://github.com/apache/spark/pull/40497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-20 Thread via GitHub
cloud-fan commented on PR #40473: URL: https://github.com/apache/spark/pull/40473#issuecomment-1477165259 The check was added to `getExprState` in https://github.com/apache/spark/pull/39010, which is to avoid canonicalizing a subquery expression and leading to NPE. I agree that we

[GitHub] [spark] amaliujia opened a new pull request, #40499: [SPARK-42876][SQL] DataType's physicalDataType should be private[sql]

2023-03-20 Thread via GitHub
amaliujia opened a new pull request, #40499: URL: https://github.com/apache/spark/pull/40499 ### What changes were proposed in this pull request? `physicalDataType` should not be a public API but be private[sql]. ### Why are the changes needed? This is to limit

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-20 Thread via GitHub
StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1142811318 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala: ## @@ -73,14 +78,34 @@ object ExplainUtils extends AdaptiveSparkPlanHelper {

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-20 Thread via GitHub
StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1142810230 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala: ## @@ -73,14 +78,34 @@ object ExplainUtils extends AdaptiveSparkPlanHelper {

[GitHub] [spark] Kimahriman commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-20 Thread via GitHub
Kimahriman commented on PR #40473: URL: https://github.com/apache/spark/pull/40473#issuecomment-1477140427 > @Kimahriman I'd love to see a good CSE implementation for higher-order functions too. But for backporting the fix (which is this PR's primary intent) that would have been too much.

[GitHub] [spark] rednaxelafx commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-20 Thread via GitHub
rednaxelafx commented on PR #40473: URL: https://github.com/apache/spark/pull/40473#issuecomment-1477136150 @Kimahriman I'd love to see a good CSE implementation for higher-order functions too. But for backporting the fix (which is this PR's primary intent) that would have been too much.

[GitHub] [spark] rednaxelafx commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-20 Thread via GitHub
rednaxelafx commented on PR #40473: URL: https://github.com/apache/spark/pull/40473#issuecomment-1477132202 @peter-toth could you please clarify why `supportedExpression()` was needed in `getExprState()` in the first place? i.e. why isn't it sufficient to add it to `addExprTree()`? --

[GitHub] [spark] rednaxelafx commented on pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

2023-03-20 Thread via GitHub
rednaxelafx commented on PR #40488: URL: https://github.com/apache/spark/pull/40488#issuecomment-1477131544 Before the recent rounds of changes to EquivalentExpressions, the old `addExprTree` used to call `addExpr` in its core:

[GitHub] [spark] github-actions[bot] commented on pull request #38534: [SPARK-38505][SQL] Make partial aggregation adaptive

2023-03-20 Thread via GitHub
github-actions[bot] commented on PR #38534: URL: https://github.com/apache/spark/pull/38534#issuecomment-1477119915 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #38661: [SPARK-41085][SQL] Support Bit manipulation function COUNTSET

2023-03-20 Thread via GitHub
github-actions[bot] closed pull request #38661: [SPARK-41085][SQL] Support Bit manipulation function COUNTSET URL: https://github.com/apache/spark/pull/38661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] github-actions[bot] commented on pull request #38608: [SPARK-41080][SQL] Support Bit manipulation function SETBIT

2023-03-20 Thread via GitHub
github-actions[bot] commented on PR #38608: URL: https://github.com/apache/spark/pull/38608#issuecomment-1477119894 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40479: [CONNECT][ML][WIP] Spark connect ML for scala client

2023-03-20 Thread via GitHub
WeichenXu123 commented on code in PR #40479: URL: https://github.com/apache/spark/pull/40479#discussion_r1142784162 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Pipeline.scala: ## @@ -17,47 +17,13 @@ package org.apache.spark.ml -import

[GitHub] [spark] amaliujia commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub
amaliujia commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142782057 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -148,6 +143,13 @@ message Read { // This is only supported by the JDBC data

[GitHub] [spark] grundprinzip commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub
grundprinzip commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142781033 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -148,6 +143,13 @@ message Read { // This is only supported by the JDBC

[GitHub] [spark] grundprinzip commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub
grundprinzip commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142780483 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -148,6 +143,13 @@ message Read { // This is only supported by the JDBC

[GitHub] [spark] zhenlineo commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub
zhenlineo commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142775827 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -148,6 +143,13 @@ message Read { // This is only supported by the JDBC data

[GitHub] [spark] zhenlineo commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub
zhenlineo commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142775413 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -183,7 +183,7 @@ class DataFrameReader private[sql] (sparkSession:

[GitHub] [spark] ueshin commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub
ueshin commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142758041 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -183,7 +183,7 @@ class DataFrameReader private[sql] (sparkSession:

[GitHub] [spark] amaliujia opened a new pull request, #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub
amaliujia opened a new pull request, #40498: URL: https://github.com/apache/spark/pull/40498 ### What changes were proposed in this pull request? It turns out that `spark.read.option.table` is a valid call chain and the `table` API does accept options when open a table.

[GitHub] [spark] JohnTortugo commented on pull request #40225: [SPARK-42625][BUILD] Upgrade `zstd-jni` to 1.5.4-2

2023-03-20 Thread via GitHub
JohnTortugo commented on PR #40225: URL: https://github.com/apache/spark/pull/40225#issuecomment-1477026864 Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun commented on pull request #40225: [SPARK-42625][BUILD] Upgrade `zstd-jni` to 1.5.4-2

2023-03-20 Thread via GitHub
dongjoon-hyun commented on PR #40225: URL: https://github.com/apache/spark/pull/40225#issuecomment-1477024392 Here is the official document about how to run the benchmark in your GitHub Action. Please see `Running benchmarks in your forked repository` Section. -

[GitHub] [spark] JohnTortugo commented on pull request #40225: [SPARK-42625][BUILD] Upgrade `zstd-jni` to 1.5.4-2

2023-03-20 Thread via GitHub
JohnTortugo commented on PR #40225: URL: https://github.com/apache/spark/pull/40225#issuecomment-1477003627 Hey @dongjoon-hyun - Can you please point me to some documentation about how this benchmarking is done? I'd like to run the same benchmark locally. -- This is an automated message

[GitHub] [spark] ueshin opened a new pull request, #40497: [SPARK-42875][CONNECT][PYTHON] Fix toPandas to handle timezone and map types properly

2023-03-20 Thread via GitHub
ueshin opened a new pull request, #40497: URL: https://github.com/apache/spark/pull/40497 ### What changes were proposed in this pull request? Fix `DataFrame.toPandas()` to handle timezone and map types properly. ### Why are the changes needed? Currently

[GitHub] [spark] HyukjinKwon commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-20 Thread via GitHub
HyukjinKwon commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-1476856830 LGTM if tests pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #40485: [SPARK-42870][CONNECT] Move `toCatalystValue` to `connect-common`

2023-03-20 Thread via GitHub
HyukjinKwon closed pull request #40485: [SPARK-42870][CONNECT] Move `toCatalystValue` to `connect-common` URL: https://github.com/apache/spark/pull/40485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #40485: [SPARK-42870][CONNECT] Move `toCatalystValue` to `connect-common`

2023-03-20 Thread via GitHub
HyukjinKwon commented on PR #40485: URL: https://github.com/apache/spark/pull/40485#issuecomment-1476846845 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] mridulm commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub
mridulm commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1142572022 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4595,6 +4595,184 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] mridulm commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub
mridulm commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1142572022 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4595,6 +4595,184 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] tgravescs commented on pull request #39127: [SPARK-41585][YARN] The Spark exclude node functionality for YARN should work independently of dynamic allocation

2023-03-20 Thread via GitHub
tgravescs commented on PR #39127: URL: https://github.com/apache/spark/pull/39127#issuecomment-1476782666 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] asfgit closed pull request #39127: [SPARK-41585][YARN] The Spark exclude node functionality for YARN should work independently of dynamic allocation

2023-03-20 Thread via GitHub
asfgit closed pull request #39127: [SPARK-41585][YARN] The Spark exclude node functionality for YARN should work independently of dynamic allocation URL: https://github.com/apache/spark/pull/39127 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] MaxGekk closed pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend`

2023-03-20 Thread via GitHub
MaxGekk closed pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend` URL: https://github.com/apache/spark/pull/40492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] aokolnychyi commented on pull request #40478: [SPARK-42779][SQL][FOLLOWUP] Allow V2 writes to indicate advisory shuffle partition size

2023-03-20 Thread via GitHub
aokolnychyi commented on PR #40478: URL: https://github.com/apache/spark/pull/40478#issuecomment-1476769291 Thanks, @dongjoon-hyun @cloud-fan! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend`

2023-03-20 Thread via GitHub
MaxGekk commented on PR #40492: URL: https://github.com/apache/spark/pull/40492#issuecomment-1476767577 +1, LGTM. Merging to master. Thank you, @LuciferYang and @dongjoon-hyun @gengliangwang @dtenedor for review. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] LuciferYang commented on pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend`

2023-03-20 Thread via GitHub
LuciferYang commented on PR #40492: URL: https://github.com/apache/spark/pull/40492#issuecomment-1476757882 GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] ueshin commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub
ueshin commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1142537848 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -1211,13 +1211,11 @@ class Column private[sql] (private[sql] val expr:

[GitHub] [spark] dtenedor opened a new pull request, #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-20 Thread via GitHub
dtenedor opened a new pull request, #40496: URL: https://github.com/apache/spark/pull/40496 ### What changes were proposed in this pull request? This PR enables the new golden file test framework for analysis for all input files. Background: * In

[GitHub] [spark] MaxGekk commented on a diff in pull request #40493: modified for SPARK-42839: Assign a name to the error class _LEGACY_ER…

2023-03-20 Thread via GitHub
MaxGekk commented on code in PR #40493: URL: https://github.com/apache/spark/pull/40493#discussion_r1142511975 ## sql/core/src/test/scala/org/apache/spark/sql/LegacyErrorTempSuit.scala: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] otterc commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub
otterc commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1142475865 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4595,6 +4595,184 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] bjornjorgensen commented on pull request #40494: [MINOR][DOCS] Fix typos

2023-03-20 Thread via GitHub
bjornjorgensen commented on PR #40494: URL: https://github.com/apache/spark/pull/40494#issuecomment-1476637999 @srowen FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-20 Thread via GitHub
StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r114291 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala: ## @@ -119,17 +155,40 @@ object ExplainUtils extends AdaptiveSparkPlanHelper

[GitHub] [spark] amaliujia commented on pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub
amaliujia commented on PR #40467: URL: https://github.com/apache/spark/pull/40467#issuecomment-1476589894 Do we need python side compatible API? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] amaliujia commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub
amaliujia commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1142418545 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -163,6 +164,14 @@ message AnalyzePlanRequest { // (Required) The logical plan to

[GitHub] [spark] mridulm commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub
mridulm commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1476585572 The test failure is unrelated to this PR - once the changes above are made, the reexecution should pass -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] mridulm commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub
mridulm commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1142362530 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4595,6 +4595,184 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] amaliujia commented on a diff in pull request #40466: [SPARK-42835][SQL][TESTS] Add test cases for `Column.explain`

2023-03-20 Thread via GitHub
amaliujia commented on code in PR #40466: URL: https://github.com/apache/spark/pull/40466#discussion_r1142414407 ## sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala: ## @@ -921,6 +922,132 @@ class ColumnExpressionSuite extends QueryTest with

[GitHub] [spark] dongjoon-hyun commented on pull request #40478: [SPARK-42779][SQL][FOLLOWUP] Allow V2 writes to indicate advisory shuffle partition size

2023-03-20 Thread via GitHub
dongjoon-hyun commented on PR #40478: URL: https://github.com/apache/spark/pull/40478#issuecomment-1476581464 Thank you, @cloud-fan ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yabola opened a new pull request, #40495: only test for reading footer within file range

2023-03-20 Thread via GitHub
yabola opened a new pull request, #40495: URL: https://github.com/apache/spark/pull/40495 only for test, please ignore it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] MaxGekk commented on a diff in pull request #40126: [SPARK-40822][SQL] Stable derived column aliases

2023-03-20 Thread via GitHub
MaxGekk commented on code in PR #40126: URL: https://github.com/apache/spark/pull/40126#discussion_r1142389921 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveAliasesSuite.scala: ## @@ -88,4 +94,46 @@ class ResolveAliasesSuite extends AnalysisTest {

[GitHub] [spark] sudoliyang opened a new pull request, #40494: [MINOR][DOCS] Fix typos

2023-03-20 Thread via GitHub
sudoliyang opened a new pull request, #40494: URL: https://github.com/apache/spark/pull/40494 ### What changes were proposed in this pull request? Fix typos in the repo. ### Why are the changes needed? Improve readability. ### Does this PR introduce

[GitHub] [spark] ruilibuaa opened a new pull request, #40493: modified for SPARK-42839: Assign a name to the error class _LEGACY_ER…

2023-03-20 Thread via GitHub
ruilibuaa opened a new pull request, #40493: URL: https://github.com/apache/spark/pull/40493 …ROR_TEMP_2003 https://issues.apache.org/jira/browse/SPARK-42839 modified: 1、error-classes.json: "_LEGACY_ERROR_TEMP_2003" --> "CANNOT_ZIP_MAPS" 2、create a test case:named

[GitHub] [spark] dtenedor commented on pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-20 Thread via GitHub
dtenedor commented on PR #40449: URL: https://github.com/apache/spark/pull/40449#issuecomment-1476470933 Looks like some extra tests got added just as this was getting merged! Thanks @LuciferYang for this fix  -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] LuciferYang commented on pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend`

2023-03-20 Thread via GitHub
LuciferYang commented on PR #40492: URL: https://github.com/apache/spark/pull/40492#issuecomment-1476455400 cc @cloud-fan @gengliangwang @dtenedor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

  1   2   >