date:20230320

[GitHub] [spark] pan3793 commented on pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-20 Thread via GitHub

pan3793 commented on PR #40444: URL: https://github.com/apache/spark/pull/40444#issuecomment-1477316443 @dongjoon-hyun thank you, I updated the log message to ``` "Application $appName with application ID $appId and submission ID $sId finished" ``` -- This is an automated

[GitHub] [spark] yliou opened a new pull request, #40502: [SPARK-42829] [UI] add repeat identifier to cached RDD on stage page

2023-03-20 Thread via GitHub

yliou opened a new pull request, #40502: URL: https://github.com/apache/spark/pull/40502 ### What changes were proposed in this pull request? Adds Repeat Identifier: to the cached RDD node on the Stages page. Made the Repeat Identifier: have bolded text so that it's easier to

[GitHub] [spark] zhengruifeng closed pull request #40501: [SPARK-42864][ML] Make IsotonicRegression.PointsAccumulator private

2023-03-20 Thread via GitHub

zhengruifeng closed pull request #40501: [SPARK-42864][ML] Make IsotonicRegression.PointsAccumulator private URL: https://github.com/apache/spark/pull/40501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] xinrong-meng commented on pull request #40500: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub

xinrong-meng commented on PR #40500: URL: https://github.com/apache/spark/pull/40500#issuecomment-1477308745 Merged to branch-3.4, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] xinrong-meng closed pull request #40500: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub

xinrong-meng closed pull request #40500: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private URL: https://github.com/apache/spark/pull/40500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #40500: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub

zhengruifeng commented on PR #40500: URL: https://github.com/apache/spark/pull/40500#issuecomment-1477304948 @xinrong-meng it seems this PR was already merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-20 Thread via GitHub

ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1142906959 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -125,13 +128,27 @@ class EquivalentExpressions { }

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-20 Thread via GitHub

ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1142906959 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -125,13 +128,27 @@ class EquivalentExpressions { }

[GitHub] [spark] sudoliyang commented on pull request #40494: [MINOR][DOCS] Fix typos

2023-03-20 Thread via GitHub

sudoliyang commented on PR #40494: URL: https://github.com/apache/spark/pull/40494#issuecomment-1477300074 I did enable action on my forked repo and rebase to `apache/spark` master. Can anyone re-run the workflows? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] rednaxelafx commented on a diff in pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

2023-03-20 Thread via GitHub

rednaxelafx commented on code in PR #40488: URL: https://github.com/apache/spark/pull/40488#discussion_r1142905889 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -296,12 +298,17 @@ object PhysicalAggregation { // build a set

[GitHub] [spark] pan3793 commented on pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-20 Thread via GitHub

pan3793 commented on PR #40444: URL: https://github.com/apache/spark/pull/40444#issuecomment-1477299823 Kindly ping @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] huaxingao commented on a diff in pull request #40495: test for reading footer within file range

2023-03-20 Thread via GitHub

huaxingao commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1142903084 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] huaxingao commented on a diff in pull request #40495: test for reading footer within file range

2023-03-20 Thread via GitHub

huaxingao commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1142902856 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-20 Thread via GitHub

dtenedor commented on code in PR #40496: URL: https://github.com/apache/spark/pull/40496#discussion_r1142891754 ## sql/core/src/test/resources/sql-tests/analyzer-results/ansi/cast.sql.out: ## @@ -0,0 +1,881 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +SELECT

[GitHub] [spark] LuciferYang commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-20 Thread via GitHub

LuciferYang commented on code in PR #40496: URL: https://github.com/apache/spark/pull/40496#discussion_r1142891030 ## sql/core/src/test/resources/sql-tests/analyzer-results/ansi/cast.sql.out: ## @@ -0,0 +1,881 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query

[GitHub] [spark] zhengruifeng opened a new pull request, #40501: [SPARK-42864][ML] Make IsotonicRegression.PointsAccumulator private

2023-03-20 Thread via GitHub

zhengruifeng opened a new pull request, #40501: URL: https://github.com/apache/spark/pull/40501 ### What changes were proposed in this pull request? Make `IsotonicRegression.PointsAccumulator` private, which was introduced in

[GitHub] [spark] xinrong-meng commented on pull request #40500: [SPARK-42864][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub

xinrong-meng commented on PR #40500: URL: https://github.com/apache/spark/pull/40500#issuecomment-1477276287 LGTM, thank you @zhengruifeng ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on pull request #40485: [SPARK-42870][CONNECT] Move `toCatalystValue` to `connect-common`

2023-03-20 Thread via GitHub

amaliujia commented on PR #40485: URL: https://github.com/apache/spark/pull/40485#issuecomment-1477271752 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] ueshin commented on pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-20 Thread via GitHub

ueshin commented on PR #40402: URL: https://github.com/apache/spark/pull/40402#issuecomment-1477260665 @zhengruifeng ah, seems like something is wrong when the schema is a column name list. Could you use `StructType` to specify the schema as a workaround? I'll take a look later. --

[GitHub] [spark] srowen commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-20 Thread via GitHub

srowen commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1477246959 I don't know enough to say whether it's worth a new method. Can we start with the change that needs no new API, is it a big enough win? -- This is an automated message from the Apache

[GitHub] [spark] mskapilks commented on pull request #40266: [SPARK-42660][SQL] Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)

2023-03-20 Thread via GitHub

mskapilks commented on PR #40266: URL: https://github.com/apache/spark/pull/40266#issuecomment-1477245713 > Looks like there are a few failures after moving the rule ([22e7886](https://github.com/apache/spark/commit/22e7886ff1059b98d1525380b2cb22718fd5dd09)). @mskapilks, do you think you

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40500: [WIP][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub

zhengruifeng commented on code in PR #40500: URL: https://github.com/apache/spark/pull/40500#discussion_r1142863924 ## mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala: ## @@ -490,15 +490,13 @@ class IsotonicRegression private (private var

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40500: [WIP][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub

WeichenXu123 commented on code in PR #40500: URL: https://github.com/apache/spark/pull/40500#discussion_r1142863633 ## mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala: ## @@ -490,15 +490,13 @@ class IsotonicRegression private (private var

[GitHub] [spark] itholic closed pull request #40270: [WIP][SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-20 Thread via GitHub

itholic closed pull request #40270: [WIP][SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function. URL: https://github.com/apache/spark/pull/40270 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] itholic commented on pull request #40270: [WIP][SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-20 Thread via GitHub

itholic commented on PR #40270: URL: https://github.com/apache/spark/pull/40270#issuecomment-1477238760 As the #40456 has been completed, will resume this one. Let me close this PR for convenience and open a new one. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] lyy-pineapple commented on a diff in pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2023-03-20 Thread via GitHub

lyy-pineapple commented on code in PR #38171: URL: https://github.com/apache/spark/pull/38171#discussion_r1142860063 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressionsJoni.scala: ## @@ -0,0 +1,471 @@ +/* + * Licensed to the Apache

[GitHub] [spark] zhengruifeng opened a new pull request, #40500: [WIP][ML][3.4] Make `IsotonicRegression.PointsAccumulator` private

2023-03-20 Thread via GitHub

zhengruifeng opened a new pull request, #40500: URL: https://github.com/apache/spark/pull/40500 ### What changes were proposed in this pull request? Make `IsotonicRegression.PointsAccumulator` private ### Why are the changes needed? `PointsAccumulator` is implementation

[GitHub] [spark] Stove-hust commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub

Stove-hust commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1142853303 ## pom.xml: ## @@ -114,7 +114,7 @@ 1.8 ${java.version} ${java.version} -3.8.7 +3.6.3 Review Comment:  Forgot to change it back after

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-20 Thread via GitHub

yabola commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1142848071 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala: ## @@ -205,11 +212,21 @@ class ParquetFileFormat val

[GitHub] [spark] amaliujia commented on a diff in pull request #40499: [SPARK-42876][SQL] DataType's physicalDataType should be private[sql]

2023-03-20 Thread via GitHub

amaliujia commented on code in PR #40499: URL: https://github.com/apache/spark/pull/40499#discussion_r1142847660 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -119,7 +119,7 @@ abstract class DataType extends AbstractDataType { override

[GitHub] [spark] zhengruifeng commented on pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-20 Thread via GitHub

zhengruifeng commented on PR #40402: URL: https://github.com/apache/spark/pull/40402#issuecomment-1477209035 I save a df with UDT in pyspark, and then read it in python client, and it works fine. So I guess something is wrong in `createDataFrame` vanilla PySpark: ``` In

[GitHub] [spark] yabola commented on a diff in pull request #40495: test for reading footer within file range

2023-03-20 Thread via GitHub

yabola commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1142841503 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] zhengruifeng commented on pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-20 Thread via GitHub

zhengruifeng commented on PR #40402: URL: https://github.com/apache/spark/pull/40402#issuecomment-1477202593 @ueshin it seem that `createDataFrame` always use the underlying `sqlType` other than the UDT itself: ``` In [1]: from pyspark.ml.linalg import Vectors In [2]: df =

[GitHub] [spark] yabola commented on a diff in pull request #40495: test for reading footer within file range

2023-03-20 Thread via GitHub

yabola commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1142841503 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] LuciferYang commented on a diff in pull request #40408: [SPARK-42780][BUILD] Upgrade `Tink` to 1.8.0

2023-03-20 Thread via GitHub

LuciferYang commented on code in PR #40408: URL: https://github.com/apache/spark/pull/40408#discussion_r1142839820 ## pom.xml: ## @@ -214,7 +214,7 @@ 1.1.0 1.5.0 1.60 -1.7.0 +1.8.0 Review Comment: @bjornjorgensen We can exclude

[GitHub] [spark] cloud-fan commented on a diff in pull request #40499: [SPARK-42876][SQL] DataType's physicalDataType should be private[sql]

2023-03-20 Thread via GitHub

cloud-fan commented on code in PR #40499: URL: https://github.com/apache/spark/pull/40499#discussion_r1142839025 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -119,7 +119,7 @@ abstract class DataType extends AbstractDataType { override

[GitHub] [spark] cloud-fan commented on a diff in pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

2023-03-20 Thread via GitHub

cloud-fan commented on code in PR #40488: URL: https://github.com/apache/spark/pull/40488#discussion_r1142837488 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -296,12 +298,17 @@ object PhysicalAggregation { // build a set of

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-20 Thread via GitHub

zhengruifeng commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1477193966 TL;DR I want to apply scalar subquery to optimize `FPGrowthModel.transform`, there are two options: 1, create temp views and use `spark.sql`, see

[GitHub] [spark] LuciferYang commented on pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend`

2023-03-20 Thread via GitHub

LuciferYang commented on PR #40492: URL: https://github.com/apache/spark/pull/40492#issuecomment-1477180380 Thanks All ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] beliefer commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub

beliefer commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1142827066 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -163,6 +164,14 @@ message AnalyzePlanRequest { // (Required) The logical plan to

[GitHub] [spark] beliefer commented on pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub

beliefer commented on PR #40467: URL: https://github.com/apache/spark/pull/40467#issuecomment-1477176898 > Do we need python side compatible API? Maybe. I will check the python side. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] beliefer commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub

beliefer commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1142826734 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -1211,13 +1211,11 @@ class Column private[sql] (private[sql] val expr:

[GitHub] [spark] beliefer commented on a diff in pull request #40466: [SPARK-42835][SQL][TESTS] Add test cases for `Column.explain`

2023-03-20 Thread via GitHub

beliefer commented on code in PR #40466: URL: https://github.com/apache/spark/pull/40466#discussion_r1142825198 ## sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala: ## @@ -921,6 +922,132 @@ class ColumnExpressionSuite extends QueryTest with

[GitHub] [spark] yaooqinn commented on a diff in pull request #40476: [MINOR][BUILD] Remove unused properties in pom file

2023-03-20 Thread via GitHub

yaooqinn commented on code in PR #40476: URL: https://github.com/apache/spark/pull/40476#discussion_r1142823591 ## resource-managers/kubernetes/integration-tests/pom.xml: ## @@ -26,8 +26,6 @@ spark-kubernetes-integration-tests_2.12 -1.3.0 - Review Comment:

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40487: [WIP] Implement CoGrouped Map API

2023-03-20 Thread via GitHub

xinrong-meng commented on code in PR #40487: URL: https://github.com/apache/spark/pull/40487#discussion_r1142823170 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -509,6 +511,26 @@ class SparkConnectPlanner(val

[GitHub] [spark] beliefer commented on pull request #40418: [SPARK-42790][SQL] Abstract the excluded method for better test for JDBC docker tests.

2023-03-20 Thread via GitHub

beliefer commented on PR #40418: URL: https://github.com/apache/spark/pull/40418#issuecomment-1477169805 @srowen Thank you! @cloud-fan @huaxingao Thank you too! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #40497: [SPARK-42875][CONNECT][PYTHON] Fix toPandas to handle timezone and map types properly

2023-03-20 Thread via GitHub

zhengruifeng commented on PR #40497: URL: https://github.com/apache/spark/pull/40497#issuecomment-1477168751 merged to master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #40497: [SPARK-42875][CONNECT][PYTHON] Fix toPandas to handle timezone and map types properly

2023-03-20 Thread via GitHub

zhengruifeng closed pull request #40497: [SPARK-42875][CONNECT][PYTHON] Fix toPandas to handle timezone and map types properly URL: https://github.com/apache/spark/pull/40497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-20 Thread via GitHub

cloud-fan commented on PR #40473: URL: https://github.com/apache/spark/pull/40473#issuecomment-1477165259 The check was added to `getExprState` in https://github.com/apache/spark/pull/39010, which is to avoid canonicalizing a subquery expression and leading to NPE. I agree that we

[GitHub] [spark] amaliujia opened a new pull request, #40499: [SPARK-42876][SQL] DataType's physicalDataType should be private[sql]

2023-03-20 Thread via GitHub

amaliujia opened a new pull request, #40499: URL: https://github.com/apache/spark/pull/40499 ### What changes were proposed in this pull request? `physicalDataType` should not be a public API but be private[sql]. ### Why are the changes needed? This is to limit

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-20 Thread via GitHub

StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1142811318 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala: ## @@ -73,14 +78,34 @@ object ExplainUtils extends AdaptiveSparkPlanHelper {

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-20 Thread via GitHub

StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1142810230 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala: ## @@ -73,14 +78,34 @@ object ExplainUtils extends AdaptiveSparkPlanHelper {

[GitHub] [spark] Kimahriman commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-20 Thread via GitHub

Kimahriman commented on PR #40473: URL: https://github.com/apache/spark/pull/40473#issuecomment-1477140427 > @Kimahriman I'd love to see a good CSE implementation for higher-order functions too. But for backporting the fix (which is this PR's primary intent) that would have been too much.

[GitHub] [spark] rednaxelafx commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-20 Thread via GitHub

rednaxelafx commented on PR #40473: URL: https://github.com/apache/spark/pull/40473#issuecomment-1477136150 @Kimahriman I'd love to see a good CSE implementation for higher-order functions too. But for backporting the fix (which is this PR's primary intent) that would have been too much.

[GitHub] [spark] rednaxelafx commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-20 Thread via GitHub

rednaxelafx commented on PR #40473: URL: https://github.com/apache/spark/pull/40473#issuecomment-1477132202 @peter-toth could you please clarify why `supportedExpression()` was needed in `getExprState()` in the first place? i.e. why isn't it sufficient to add it to `addExprTree()`? --

[GitHub] [spark] rednaxelafx commented on pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

2023-03-20 Thread via GitHub

rednaxelafx commented on PR #40488: URL: https://github.com/apache/spark/pull/40488#issuecomment-1477131544 Before the recent rounds of changes to EquivalentExpressions, the old `addExprTree` used to call `addExpr` in its core:

[GitHub] [spark] github-actions[bot] commented on pull request #38534: [SPARK-38505][SQL] Make partial aggregation adaptive

2023-03-20 Thread via GitHub

github-actions[bot] commented on PR #38534: URL: https://github.com/apache/spark/pull/38534#issuecomment-1477119915 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #38661: [SPARK-41085][SQL] Support Bit manipulation function COUNTSET

2023-03-20 Thread via GitHub

github-actions[bot] closed pull request #38661: [SPARK-41085][SQL] Support Bit manipulation function COUNTSET URL: https://github.com/apache/spark/pull/38661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] github-actions[bot] commented on pull request #38608: [SPARK-41080][SQL] Support Bit manipulation function SETBIT

2023-03-20 Thread via GitHub

github-actions[bot] commented on PR #38608: URL: https://github.com/apache/spark/pull/38608#issuecomment-1477119894 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40479: [CONNECT][ML][WIP] Spark connect ML for scala client

2023-03-20 Thread via GitHub

WeichenXu123 commented on code in PR #40479: URL: https://github.com/apache/spark/pull/40479#discussion_r1142784162 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Pipeline.scala: ## @@ -17,47 +17,13 @@ package org.apache.spark.ml -import

[GitHub] [spark] amaliujia commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub

amaliujia commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142782057 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -148,6 +143,13 @@ message Read { // This is only supported by the JDBC data

[GitHub] [spark] grundprinzip commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub

grundprinzip commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142781033 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -148,6 +143,13 @@ message Read { // This is only supported by the JDBC

[GitHub] [spark] grundprinzip commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub

grundprinzip commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142780483 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -148,6 +143,13 @@ message Read { // This is only supported by the JDBC

[GitHub] [spark] zhenlineo commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub

zhenlineo commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142775827 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -148,6 +143,13 @@ message Read { // This is only supported by the JDBC data

[GitHub] [spark] zhenlineo commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub

zhenlineo commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142775413 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -183,7 +183,7 @@ class DataFrameReader private[sql] (sparkSession:

[GitHub] [spark] ueshin commented on a diff in pull request #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub

ueshin commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142758041 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -183,7 +183,7 @@ class DataFrameReader private[sql] (sparkSession:

[GitHub] [spark] amaliujia opened a new pull request, #40498: [WIP] reader table API could also accept options

2023-03-20 Thread via GitHub

amaliujia opened a new pull request, #40498: URL: https://github.com/apache/spark/pull/40498 ### What changes were proposed in this pull request? It turns out that `spark.read.option.table` is a valid call chain and the `table` API does accept options when open a table.

[GitHub] [spark] JohnTortugo commented on pull request #40225: [SPARK-42625][BUILD] Upgrade `zstd-jni` to 1.5.4-2

2023-03-20 Thread via GitHub

JohnTortugo commented on PR #40225: URL: https://github.com/apache/spark/pull/40225#issuecomment-1477026864 Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun commented on pull request #40225: [SPARK-42625][BUILD] Upgrade `zstd-jni` to 1.5.4-2

2023-03-20 Thread via GitHub

dongjoon-hyun commented on PR #40225: URL: https://github.com/apache/spark/pull/40225#issuecomment-1477024392 Here is the official document about how to run the benchmark in your GitHub Action. Please see `Running benchmarks in your forked repository` Section. -

[GitHub] [spark] JohnTortugo commented on pull request #40225: [SPARK-42625][BUILD] Upgrade `zstd-jni` to 1.5.4-2

2023-03-20 Thread via GitHub

JohnTortugo commented on PR #40225: URL: https://github.com/apache/spark/pull/40225#issuecomment-1477003627 Hey @dongjoon-hyun - Can you please point me to some documentation about how this benchmarking is done? I'd like to run the same benchmark locally. -- This is an automated message

[GitHub] [spark] ueshin opened a new pull request, #40497: [SPARK-42875][CONNECT][PYTHON] Fix toPandas to handle timezone and map types properly

2023-03-20 Thread via GitHub

ueshin opened a new pull request, #40497: URL: https://github.com/apache/spark/pull/40497 ### What changes were proposed in this pull request? Fix `DataFrame.toPandas()` to handle timezone and map types properly. ### Why are the changes needed? Currently

[GitHub] [spark] HyukjinKwon commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-20 Thread via GitHub

HyukjinKwon commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-1476856830 LGTM if tests pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #40485: [SPARK-42870][CONNECT] Move `toCatalystValue` to `connect-common`

2023-03-20 Thread via GitHub

HyukjinKwon closed pull request #40485: [SPARK-42870][CONNECT] Move `toCatalystValue` to `connect-common` URL: https://github.com/apache/spark/pull/40485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #40485: [SPARK-42870][CONNECT] Move `toCatalystValue` to `connect-common`

2023-03-20 Thread via GitHub

HyukjinKwon commented on PR #40485: URL: https://github.com/apache/spark/pull/40485#issuecomment-1476846845 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] mridulm commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub

mridulm commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1142572022 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4595,6 +4595,184 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] mridulm commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub

mridulm commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1142572022 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4595,6 +4595,184 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] tgravescs commented on pull request #39127: [SPARK-41585][YARN] The Spark exclude node functionality for YARN should work independently of dynamic allocation

2023-03-20 Thread via GitHub

tgravescs commented on PR #39127: URL: https://github.com/apache/spark/pull/39127#issuecomment-1476782666 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] asfgit closed pull request #39127: [SPARK-41585][YARN] The Spark exclude node functionality for YARN should work independently of dynamic allocation

2023-03-20 Thread via GitHub

asfgit closed pull request #39127: [SPARK-41585][YARN] The Spark exclude node functionality for YARN should work independently of dynamic allocation URL: https://github.com/apache/spark/pull/39127 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] MaxGekk closed pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend`

2023-03-20 Thread via GitHub

MaxGekk closed pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend` URL: https://github.com/apache/spark/pull/40492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] aokolnychyi commented on pull request #40478: [SPARK-42779][SQL][FOLLOWUP] Allow V2 writes to indicate advisory shuffle partition size

2023-03-20 Thread via GitHub

aokolnychyi commented on PR #40478: URL: https://github.com/apache/spark/pull/40478#issuecomment-1476769291 Thanks, @dongjoon-hyun @cloud-fan! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend`

2023-03-20 Thread via GitHub

MaxGekk commented on PR #40492: URL: https://github.com/apache/spark/pull/40492#issuecomment-1476767577 +1, LGTM. Merging to master. Thank you, @LuciferYang and @dongjoon-hyun @gengliangwang @dtenedor for review. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] LuciferYang commented on pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend`

2023-03-20 Thread via GitHub

LuciferYang commented on PR #40492: URL: https://github.com/apache/spark/pull/40492#issuecomment-1476757882 GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] ueshin commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub

ueshin commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1142537848 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -1211,13 +1211,11 @@ class Column private[sql] (private[sql] val expr:

[GitHub] [spark] dtenedor opened a new pull request, #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-20 Thread via GitHub

dtenedor opened a new pull request, #40496: URL: https://github.com/apache/spark/pull/40496 ### What changes were proposed in this pull request? This PR enables the new golden file test framework for analysis for all input files. Background: * In

[GitHub] [spark] MaxGekk commented on a diff in pull request #40493: modified for SPARK-42839: Assign a name to the error class _LEGACY_ER…

2023-03-20 Thread via GitHub

MaxGekk commented on code in PR #40493: URL: https://github.com/apache/spark/pull/40493#discussion_r1142511975 ## sql/core/src/test/scala/org/apache/spark/sql/LegacyErrorTempSuit.scala: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] otterc commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub

otterc commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1142475865 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4595,6 +4595,184 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] bjornjorgensen commented on pull request #40494: [MINOR][DOCS] Fix typos

2023-03-20 Thread via GitHub

bjornjorgensen commented on PR #40494: URL: https://github.com/apache/spark/pull/40494#issuecomment-1476637999 @srowen FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-20 Thread via GitHub

StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r114291 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala: ## @@ -119,17 +155,40 @@ object ExplainUtils extends AdaptiveSparkPlanHelper

[GitHub] [spark] amaliujia commented on pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub

amaliujia commented on PR #40467: URL: https://github.com/apache/spark/pull/40467#issuecomment-1476589894 Do we need python side compatible API? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] amaliujia commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-20 Thread via GitHub

amaliujia commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1142418545 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -163,6 +164,14 @@ message AnalyzePlanRequest { // (Required) The logical plan to

[GitHub] [spark] mridulm commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub

mridulm commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1476585572 The test failure is unrelated to this PR - once the changes above are made, the reexecution should pass -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] mridulm commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-20 Thread via GitHub

mridulm commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1142362530 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4595,6 +4595,184 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] amaliujia commented on a diff in pull request #40466: [SPARK-42835][SQL][TESTS] Add test cases for `Column.explain`

2023-03-20 Thread via GitHub

amaliujia commented on code in PR #40466: URL: https://github.com/apache/spark/pull/40466#discussion_r1142414407 ## sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala: ## @@ -921,6 +922,132 @@ class ColumnExpressionSuite extends QueryTest with

[GitHub] [spark] dongjoon-hyun commented on pull request #40478: [SPARK-42779][SQL][FOLLOWUP] Allow V2 writes to indicate advisory shuffle partition size

2023-03-20 Thread via GitHub

dongjoon-hyun commented on PR #40478: URL: https://github.com/apache/spark/pull/40478#issuecomment-1476581464 Thank you, @cloud-fan ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yabola opened a new pull request, #40495: only test for reading footer within file range

2023-03-20 Thread via GitHub

yabola opened a new pull request, #40495: URL: https://github.com/apache/spark/pull/40495 only for test, please ignore it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] MaxGekk commented on a diff in pull request #40126: [SPARK-40822][SQL] Stable derived column aliases

2023-03-20 Thread via GitHub

MaxGekk commented on code in PR #40126: URL: https://github.com/apache/spark/pull/40126#discussion_r1142389921 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveAliasesSuite.scala: ## @@ -88,4 +94,46 @@ class ResolveAliasesSuite extends AnalysisTest {

[GitHub] [spark] sudoliyang opened a new pull request, #40494: [MINOR][DOCS] Fix typos

2023-03-20 Thread via GitHub

sudoliyang opened a new pull request, #40494: URL: https://github.com/apache/spark/pull/40494 ### What changes were proposed in this pull request? Fix typos in the repo. ### Why are the changes needed? Improve readability. ### Does this PR introduce

[GitHub] [spark] ruilibuaa opened a new pull request, #40493: modified for SPARK-42839: Assign a name to the error class _LEGACY_ER…

2023-03-20 Thread via GitHub

ruilibuaa opened a new pull request, #40493: URL: https://github.com/apache/spark/pull/40493 …ROR_TEMP_2003 https://issues.apache.org/jira/browse/SPARK-42839 modified: 1、error-classes.json: "_LEGACY_ERROR_TEMP_2003" --> "CANNOT_ZIP_MAPS" 2、create a test case：named

[GitHub] [spark] dtenedor commented on pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-20 Thread via GitHub

dtenedor commented on PR #40449: URL: https://github.com/apache/spark/pull/40449#issuecomment-1476470933 Looks like some extra tests got added just as this was getting merged! Thanks @LuciferYang for this fix  -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] LuciferYang commented on pull request #40492: [SPARK-42791][SQL][FOLLOWUP] Re-generate golden files for `array_prepend`

2023-03-20 Thread via GitHub

LuciferYang commented on PR #40492: URL: https://github.com/apache/spark/pull/40492#issuecomment-1476455400 cc @cloud-fan @gengliangwang @dtenedor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

1 2 >

1 - 100 of 189 matches

Mail list logo