[GitHub] [spark] LuciferYang commented on pull request #38926: [SPARK-41247][BUILD][FOLLOWUP] Make sbt and maven use the same Protobuf version

2022-12-06 Thread GitBox
LuciferYang commented on PR #38926: URL: https://github.com/apache/spark/pull/38926#issuecomment-1338981866 Thanks @HyukjinKwon @dongjoon-hyun @gengliangwang @zhengruifeng @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun closed pull request #38637: [SPARK-41121][BUILD] Upgrade `sbt-assembly` to 2.0.0

2022-12-06 Thread GitBox
dongjoon-hyun closed pull request #38637: [SPARK-41121][BUILD] Upgrade `sbt-assembly` to 2.0.0 URL: https://github.com/apache/spark/pull/38637 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1040857399 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left:

[GitHub] [spark] panbingkun commented on a diff in pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2022-12-06 Thread GitBox
panbingkun commented on code in PR #38861: URL: https://github.com/apache/spark/pull/38861#discussion_r1040885084 ## core/src/main/resources/error/error-classes.json: ## @@ -876,6 +876,13 @@ ], "sqlState" : "42000" }, + "NOT_ENOUGH_DATA_COLUMNS" : { +"message"

[GitHub] [spark] HyukjinKwon closed pull request #38926: [SPARK-41247][BUILD][FOLLOWUP] Make sbt and maven use the same Protobuf version

2022-12-06 Thread GitBox
HyukjinKwon closed pull request #38926: [SPARK-41247][BUILD][FOLLOWUP] Make sbt and maven use the same Protobuf version URL: https://github.com/apache/spark/pull/38926 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Add defensive assertions to Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1040675168 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,26 @@ private[kafka010] class

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38934: [SPARK-41317][CONNECT][TESTS][FOLLOWUP] Import WriteOperation only when pandas is available

2022-12-06 Thread GitBox
dongjoon-hyun opened a new pull request, #38934: URL: https://github.com/apache/spark/pull/38934 ### What changes were proposed in this pull request? This is the last piece to recover `pyspark-connect` tests on a system where pandas is unavailable. ### Why are the changes

[GitHub] [spark] LuciferYang opened a new pull request, #38936: [DON'T MERGE] Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread GitBox
LuciferYang opened a new pull request, #38936: URL: https://github.com/apache/spark/pull/38936 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] yabola commented on pull request #38882: [SPARK-41365][UI] Stages UI page fails to load for proxy in specific yarn environment

2022-12-06 Thread GitBox
yabola commented on PR #38882: URL: https://github.com/apache/spark/pull/38882#issuecomment-1339136133 @gengliangwang I see comments from other people in this issues (the last one), I think he had the same problem, it is after this issues fixed time.

[GitHub] [spark] cloud-fan commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38915: URL: https://github.com/apache/spark/pull/38915#discussion_r1040835872 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -539,6 +539,15 @@ class SparkConnectPlanner(session:

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1040845039 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala: ## @@ -5237,6 +5237,59 @@ class DataFrameFunctionsSuite extends QueryTest with

[GitHub] [spark] HyukjinKwon commented on pull request #38936: [DON'T MERGE][BUILD] Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread GitBox
HyukjinKwon commented on PR #38936: URL: https://github.com/apache/spark/pull/38936#issuecomment-1339348370 cc @steveloughran FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1040956663 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun closed pull request #38918: [SPARK-41393][BUILD] Upgrade slf4j to 2.0.5

2022-12-06 Thread GitBox
dongjoon-hyun closed pull request #38918: [SPARK-41393][BUILD] Upgrade slf4j to 2.0.5 URL: https://github.com/apache/spark/pull/38918 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041067010 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala: ## @@ -349,6 +354,54 @@ private[kafka010] class KafkaSource( }

[GitHub] [spark] LuciferYang opened a new pull request, #38940: [SPARK-41409][CORE][SQL] Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread GitBox
LuciferYang opened a new pull request, #38940: URL: https://github.com/apache/spark/pull/38940 ### What changes were proposed in this pull request? This pr aims to reuse error class `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`. ### Why are the changes needed?

[GitHub] [spark] wecharyu commented on pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-06 Thread GitBox
wecharyu commented on PR #38898: URL: https://github.com/apache/spark/pull/38898#issuecomment-1339648870 > Can you write a unit test for this? It seems a bit difficult to write unit test to cover the case where fetching empty partitions from Kafka cluster, any idea will be

[GitHub] [spark] MaxGekk commented on pull request #38864: [SPARK-41271][SQL] Support parameterized SQL queries by `sql()`

2022-12-06 Thread GitBox
MaxGekk commented on PR #38864: URL: https://github.com/apache/spark/pull/38864#issuecomment-1339530615 > I could just as easily say that we should choose : to make it easier to migrate from Redshift ... How about to support both `@` and `:`? In that way, we cover and encourage more

[GitHub] [spark] cloud-fan opened a new pull request, #38942: [WIP] Do not optimize the input query twice for v1 write fallback

2022-12-06 Thread GitBox
cloud-fan opened a new pull request, #38942: URL: https://github.com/apache/spark/pull/38942 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] HeartSaVioR commented on pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on PR #38911: URL: https://github.com/apache/spark/pull/38911#issuecomment-1339489629 I just made a change to do some "actual" assertion, via fetching the latest information for topic-partitions and their latest offset. Hope it makes more sense. -- This is an

[GitHub] [spark] melin commented on pull request #38496: [SPARK-40708][SQL] Auto update table statistics based on write metrics

2022-12-06 Thread GitBox
melin commented on PR #38496: URL: https://github.com/apache/spark/pull/38496#issuecomment-1339572472 > > Support partition statistics? > > @melin I'm working on the supporting of partition statistics update, it relies on workers to return detailed partition statistics. Can

[GitHub] [spark] fred-db opened a new pull request, #38941: [WIP] Propagate metadata through Union

2022-12-06 Thread GitBox
fred-db opened a new pull request, #38941: URL: https://github.com/apache/spark/pull/38941 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dongjoon-hyun closed pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
dongjoon-hyun closed pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match URL: https://github.com/apache/spark/pull/38924 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on pull request #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38924: URL: https://github.com/apache/spark/pull/38924#issuecomment-1339580946 Merged to master for Apache Spark 3.4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #38931: [SPARK-41001][CONNECT][TESTS][FOLLOWUP] `ChannelBuilderTests` should be skipped by `should_test_connect` flag

2022-12-06 Thread GitBox
HyukjinKwon commented on PR #38931: URL: https://github.com/apache/spark/pull/38931#issuecomment-1338977442 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #38888: [SPARK-41405][SQL] Centralize the column resolution logic

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #3: URL: https://github.com/apache/spark/pull/3#discussion_r1040777568 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1591,12 +1620,129 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38883: [SPARK-41366][CONNECT] DF.groupby.agg() should be compatible

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38883: URL: https://github.com/apache/spark/pull/38883#discussion_r1040844793 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -397,7 +397,7 @@ class SparkConnectPlanner(session:

[GitHub] [spark] infoankitp commented on pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on PR #38865: URL: https://github.com/apache/spark/pull/38865#issuecomment-1339187683 Ran the above command got below output `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *ExpressionsSchemaSuite"` ``` [info] ExpressionsSchemaSuite: 17:00:33.694

[GitHub] [spark] cloud-fan commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1040868659 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on pull request #38637: [SPARK-41121][BUILD] Upgrade `sbt-assembly` to 2.0.0

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38637: URL: https://github.com/apache/spark/pull/38637#issuecomment-1339031913 Merged to master for Apache Spark 3.4.0. Thank you, @panbingkun and @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38915: URL: https://github.com/apache/spark/pull/38915#discussion_r1040759564 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -552,6 +554,27 @@ class SparkConnectPlanner(session:

[GitHub] [spark] panbingkun opened a new pull request, #38937: [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread GitBox
panbingkun opened a new pull request, #38937: URL: https://github.com/apache/spark/pull/38937 ### What changes were proposed in this pull request? The pr aims to refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic. ### Why are the changes needed?

[GitHub] [spark] thejdeep commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-06 Thread GitBox
thejdeep commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1040884126 ## core/src/test/resources/HistoryServerExpectations/excludeOnFailure_node_for_stage_expectation.json: ## @@ -81,7 +93,19 @@ "remoteBytesRead" : 0,

[GitHub] [spark] peter-toth commented on pull request #38885: [SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog

2022-12-06 Thread GitBox
peter-toth commented on PR #38885: URL: https://github.com/apache/spark/pull/38885#issuecomment-1339214805 @cloud-fan, do you think we could start enabling V2 file tables with this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng opened a new pull request, #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
zhengruifeng opened a new pull request, #38935: URL: https://github.com/apache/spark/pull/38935 ### What changes were proposed in this pull request? Implement `Column.{when, otherwise}` and Function `when` ### Why are the changes needed? For API coverage ### Does

[GitHub] [spark] cloud-fan commented on a diff in pull request #38933: [SPARK-41404][SQL][TESTS] Refactor `ColumnVectorUtils#toBatch` to make `ColumnarBatchSuite#testRandomRows` test more dataType

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38933: URL: https://github.com/apache/spark/pull/38933#discussion_r1040842633 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -165,7 +171,17 @@ private static void

[GitHub] [spark] roczei commented on pull request #38828: [SPARK-35084][CORE] Spark 3: supporting --packages in k8s cluster mode

2022-12-06 Thread GitBox
roczei commented on PR #38828: URL: https://github.com/apache/spark/pull/38828#issuecomment-1339222364 Thanks @ocworld for the uploaded unit test! It works perfectly, it can identify the issue. Good: ``` - SPARK-35084: includes jars passed in through --packages in k8s

[GitHub] [spark] infoankitp commented on pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on PR #38865: URL: https://github.com/apache/spark/pull/38865#issuecomment-1339304055 > @infoankitp Would you mind adding some sql related tests to `sql-tests/inputs/array.sql`? Added in the recent Commit. -- This is an automated message from the Apache Git

[GitHub] [spark] dongjoon-hyun commented on pull request #38934: [SPARK-41317][CONNECT][TESTS][FOLLOWUP] Import WriteOperation only when pandas is available

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38934: URL: https://github.com/apache/spark/pull/38934#issuecomment-1339027484 cc @grundprinzip , @hvanhovell , @bjornjorgensen , @zhengruifeng , @HyukjinKwon , @amaliujia -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38901: [SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38901: URL: https://github.com/apache/spark/pull/38901#discussion_r1040709804 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -85,7 +85,19 @@ private[spark] class CoarseGrainedExecutorBackend(

[GitHub] [spark] cloud-fan commented on a diff in pull request #38888: [SPARK-41405][SQL] Centralize the column resolution logic

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #3: URL: https://github.com/apache/spark/pull/3#discussion_r1040774763 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -689,9 +687,26 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1040845039 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala: ## @@ -5237,6 +5237,59 @@ class DataFrameFunctionsSuite extends QueryTest with

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1040854537 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left:

[GitHub] [spark] beliefer opened a new pull request, #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
beliefer opened a new pull request, #38938: URL: https://github.com/apache/spark/pull/38938 ### What changes were proposed in this pull request? Implement `DataFrame.describe` with a proto message ### Why are the changes needed? for Connect API coverage ### Does

[GitHub] [spark] dongjoon-hyun commented on pull request #38931: [SPARK-41001][CONNECT][TESTS][FOLLOWUP] `ChannelBuilderTests` should be skipped by `should_test_connect` flag

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38931: URL: https://github.com/apache/spark/pull/38931#issuecomment-1338994077 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #38888: [SPARK-41405][SQL] Centralize the column resolution logic

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #3: URL: https://github.com/apache/spark/pull/3#discussion_r1040779582 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1591,12 +1620,129 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1040837728 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left:

[GitHub] [spark] zhengruifeng commented on pull request #38917: [SPARK-41391][SQL] The output column name of `groupBy.agg(count_distinct)` is incorrect

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38917: URL: https://github.com/apache/spark/pull/38917#issuecomment-1339167796 need to take `*` into account, and `groupBy.agg(count_distinct($"*"))` output column `count(unresolvedstar())` ``` scala> df.select(count_distinct(col("*"))) res12:

[GitHub] [spark] ulysses-you opened a new pull request, #38939: [WIP][SPARK-41407][SQL] Pull out v1 write to WriteFiles

2022-12-06 Thread GitBox
ulysses-you opened a new pull request, #38939: URL: https://github.com/apache/spark/pull/38939 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1040953239 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left:

[GitHub] [spark] HyukjinKwon commented on pull request #38926: [SPARK-41247][BUILD][FOLLOWUP] Make sbt and maven use the same Protobuf version

2022-12-06 Thread GitBox
HyukjinKwon commented on PR #38926: URL: https://github.com/apache/spark/pull/38926#issuecomment-1338979189 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun commented on a diff in pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2022-12-06 Thread GitBox
panbingkun commented on code in PR #38861: URL: https://github.com/apache/spark/pull/38861#discussion_r1040721306 ## core/src/main/resources/error/error-classes.json: ## @@ -876,6 +876,13 @@ ], "sqlState" : "42000" }, + "NOT_ENOUGH_DATA_COLUMNS" : { +"message"

[GitHub] [spark] dongjoon-hyun commented on pull request #38934: [SPARK-41317][CONNECT][TESTS][FOLLOWUP] Import `WriteOperation` only when `pandas` is available

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38934: URL: https://github.com/apache/spark/pull/38934#issuecomment-1339150057 Thank you, @HyukjinKwon . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] cloud-fan commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38915: URL: https://github.com/apache/spark/pull/38915#discussion_r1040835872 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -539,6 +539,15 @@ class SparkConnectPlanner(session:

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1040835864 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left:

[GitHub] [spark] dengziming commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-06 Thread GitBox
dengziming commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1040852710 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala: ## @@ -694,6 +685,18 @@ package object dsl { .build() } +

[GitHub] [spark] cloud-fan commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1040959706 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on pull request #38682: [SPARK-41167][SQL] Improve multi like performance by creating a balanced expression tree predicate

2022-12-06 Thread GitBox
cloud-fan commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1339368069 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon closed pull request #38931: [SPARK-41001][CONNECT][TESTS][FOLLOWUP] `ChannelBuilderTests` should be skipped by `should_test_connect` flag

2022-12-06 Thread GitBox
HyukjinKwon closed pull request #38931: [SPARK-41001][CONNECT][TESTS][FOLLOWUP] `ChannelBuilderTests` should be skipped by `should_test_connect` flag URL: https://github.com/apache/spark/pull/38931 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Add defensive assertions to Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1040675168 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,26 @@ private[kafka010] class

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38801: [SPARK-41317][CONNECT][PYTHON] Add basic support for DataFrameWriter

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38801: URL: https://github.com/apache/spark/pull/38801#discussion_r1040708581 ## python/pyspark/sql/tests/connect/test_connect_plan_only.py: ## @@ -17,6 +17,7 @@ from typing import cast import unittest +from pyspark.sql.connect.plan

[GitHub] [spark] zhengruifeng commented on pull request #38917: [SPARK-41391][SQL] The output column name of `groupBy.agg(count_distinct)` is incorrect

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38917: URL: https://github.com/apache/spark/pull/38917#issuecomment-1339120195 `sql - other` keeps failing, I need a bit more time to investigate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun closed pull request #38934: [SPARK-41317][CONNECT][TESTS][FOLLOWUP] Import `WriteOperation` only when `pandas` is available

2022-12-06 Thread GitBox
dongjoon-hyun closed pull request #38934: [SPARK-41317][CONNECT][TESTS][FOLLOWUP] Import `WriteOperation` only when `pandas` is available URL: https://github.com/apache/spark/pull/38934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1040841235 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1040849638 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala: ## @@ -694,6 +685,18 @@ package object dsl { .build() } +

[GitHub] [spark] LuciferYang commented on pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
LuciferYang commented on PR #38865: URL: https://github.com/apache/spark/pull/38865#issuecomment-1339190595 > Ran the above command got below output `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *ExpressionsSchemaSuite"` > > ``` > [info] ExpressionsSchemaSuite: >

[GitHub] [spark] zhengruifeng closed pull request #38917: [SPARK-41391][SQL] The output column name of `groupBy.agg(count_distinct)` is incorrect

2022-12-06 Thread GitBox
zhengruifeng closed pull request #38917: [SPARK-41391][SQL] The output column name of `groupBy.agg(count_distinct)` is incorrect URL: https://github.com/apache/spark/pull/38917 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng commented on pull request #38917: [SPARK-41391][SQL] The output column name of `groupBy.agg(count_distinct)` is incorrect

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38917: URL: https://github.com/apache/spark/pull/38917#issuecomment-1339256780 this PR causes `SPARK-27581: DataFrame count_distinct("*") shouldn't fail with AnalysisException` fail: ``` 2022-12-06T10:00:45.0030472Z [info] -

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1040942117 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala: ## @@ -5237,6 +5237,59 @@ class DataFrameFunctionsSuite extends QueryTest with

[GitHub] [spark] LuciferYang commented on pull request #38936: [DON'T MERGE][BUILD] Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread GitBox
LuciferYang commented on PR #38936: URL: https://github.com/apache/spark/pull/38936#issuecomment-1339392001 @steveloughran How can we test your scenario? The compilation of Java 11 and Java 17 in GA Task is run using Maven -- This is an automated message from the Apache Git

[GitHub] [spark] bersprockets commented on pull request #38923: [SPARK-41395][SQL] `InterpretedMutableProjection` should use `setDecimal` to set null values for high-precision decimals in an unsafe ro

2022-12-06 Thread GitBox
bersprockets commented on PR #38923: URL: https://github.com/apache/spark/pull/38923#issuecomment-1339762795 By the way, there's a similar-looking problem with type `CalendarInterval`: ``` set spark.sql.codegen.wholeStage=false; set spark.sql.codegen.factoryMode=NO_CODEGEN;

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun commented on code in PR #38943: URL: https://github.com/apache/spark/pull/38943#discussion_r1041347366 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala: ## @@ -101,6 +100,17 @@ private[spark] object Config extends

[GitHub] [spark] viirya commented on a diff in pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
viirya commented on code in PR #38943: URL: https://github.com/apache/spark/pull/38943#discussion_r1041378394 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -47,6 +48,17 @@ class

[GitHub] [spark] gengliangwang commented on pull request #38882: [SPARK-41365][UI] Stages UI page fails to load for proxy in specific yarn environment

2022-12-06 Thread GitBox
gengliangwang commented on PR #38882: URL: https://github.com/apache/spark/pull/38882#issuecomment-1339985330 @yabola From the screenshot you provided, I don't see any double-quoted URL like `%255B...`. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] bersprockets commented on a diff in pull request #38923: [SPARK-41395][SQL] `InterpretedMutableProjection` should use `setDecimal` to set null values for high-precision decimals in an

2022-12-06 Thread GitBox
bersprockets commented on code in PR #38923: URL: https://github.com/apache/spark/pull/38923#discussion_r1041454229 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala: ## @@ -67,6 +69,7 @@ class

[GitHub] [spark] bersprockets commented on a diff in pull request #38923: [SPARK-41395][SQL] `InterpretedMutableProjection` should use `setDecimal` to set null values for high-precision decimals in an

2022-12-06 Thread GitBox
bersprockets commented on code in PR #38923: URL: https://github.com/apache/spark/pull/38923#discussion_r1041454229 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala: ## @@ -67,6 +69,7 @@ class

[GitHub] [spark] hvanhovell closed pull request #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-06 Thread GitBox
hvanhovell closed pull request #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects URL: https://github.com/apache/spark/pull/38944 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] amaliujia commented on a diff in pull request #38921: [SPARK-41397][CONNECT][PYTHON] Implement part of string/binary functions

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38921: URL: https://github.com/apache/spark/pull/38921#discussion_r1041309241 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -410,6 +410,67 @@ def test_aggregation_functions(self):

[GitHub] [spark] dongjoon-hyun commented on pull request #38928: [SPARK-41034][CONNECT][TESTS][FOLLOWUP] `connectutils` should be skipped when pandas is not installed

2022-12-06 Thread GitBox
dongjoon-hyun commented on PR #38928: URL: https://github.com/apache/spark/pull/38928#issuecomment-1339790284 Thank you, @amaliujia . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
anchovYu commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1041275311 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] viirya commented on a diff in pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
viirya commented on code in PR #38943: URL: https://github.com/apache/spark/pull/38943#discussion_r1041333237 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala: ## @@ -101,6 +100,17 @@ private[spark] object Config extends Logging {

[GitHub] [spark] viirya commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
viirya commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041429490 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
dongjoon-hyun opened a new pull request, #38943: URL: https://github.com/apache/spark/pull/38943 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] viirya commented on a diff in pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
viirya commented on code in PR #38943: URL: https://github.com/apache/spark/pull/38943#discussion_r1041330497 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -398,6 +410,10 @@ class

[GitHub] [spark] viirya commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
viirya commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041484930 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -294,7 +313,30 @@ abstract class InMemoryBaseTable( val

[GitHub] [spark] amaliujia commented on pull request #38928: [SPARK-41034][CONNECT][TESTS][FOLLOWUP] `connectutils` should be skipped when pandas is not installed

2022-12-06 Thread GitBox
amaliujia commented on PR #38928: URL: https://github.com/apache/spark/pull/38928#issuecomment-1339779171 @dongjoon-hyun really appreciate for this fix! late LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] grundprinzip commented on a diff in pull request #38883: [SPARK-41366][CONNECT] DF.groupby.agg() should be compatible

2022-12-06 Thread GitBox
grundprinzip commented on code in PR #38883: URL: https://github.com/apache/spark/pull/38883#discussion_r1041316676 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -397,7 +397,7 @@ class SparkConnectPlanner(session:

[GitHub] [spark] amaliujia commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041314131 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -404,6 +405,18 @@ message StatSummary { repeated string statistics = 2; } +//

[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
anchovYu commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1041325612 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] viirya commented on a diff in pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
viirya commented on code in PR #38943: URL: https://github.com/apache/spark/pull/38943#discussion_r1041352722 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -398,6 +410,10 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38944: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-06 Thread GitBox
hvanhovell commented on code in PR #38944: URL: https://github.com/apache/spark/pull/38944#discussion_r1041362902 ## python/pyspark/testing/connectutils.py: ## @@ -28,7 +28,7 @@ from pyspark.sql.connect.plan import LogicalPlan from pyspark.sql.connect.session import

[GitHub] [spark] amaliujia commented on a diff in pull request #38935: [SPARK-41319][CONNECT][PYTHON] Implement `Column.{when, otherwise}` and Function `when`

2022-12-06 Thread GitBox
amaliujia commented on code in PR #38935: URL: https://github.com/apache/spark/pull/38935#discussion_r1041392021 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -587,6 +588,14 @@ class SparkConnectPlanner(session:

[GitHub] [spark] viirya commented on a diff in pull request #38943: [SPARK-41410][K8S] Support PVC-oriented executor pod allocation

2022-12-06 Thread GitBox
viirya commented on code in PR #38943: URL: https://github.com/apache/spark/pull/38943#discussion_r1041414732 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala: ## @@ -47,6 +48,17 @@ class

[GitHub] [spark] sunchao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
sunchao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041451334 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] viirya commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
viirya commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041483047 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -294,7 +313,30 @@ abstract class InMemoryBaseTable( val

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1041487582 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,9 +92,13 @@ object SchemaConverters {

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1041487180 ## connector/protobuf/src/test/resources/protobuf/functions_suite.proto: ## @@ -170,4 +170,41 @@ message timeStampMsg { message durationMsg { string key =

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1041487447 ## connector/protobuf/src/test/resources/protobuf/functions_suite.proto: ## @@ -170,4 +170,41 @@ message timeStampMsg { message durationMsg { string key =

[GitHub] [spark] xinrong-meng commented on pull request #38921: [SPARK-41397][CONNECT][PYTHON] Implement part of string/binary functions

2022-12-06 Thread GitBox
xinrong-meng commented on PR #38921: URL: https://github.com/apache/spark/pull/38921#issuecomment-1339719745 @amaliujia @HyukjinKwon @grundprinzip Would you please review? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-06 Thread GitBox
anchovYu commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1041275311 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software

  1   2   3   4   >