[GitHub] [spark] amaliujia commented on a diff in pull request #40135: [SPARK-42444][PYTHON] `DataFrame.drop` should handle multi columns properly

2023-02-22 Thread via GitHub
amaliujia commented on code in PR #40135: URL: https://github.com/apache/spark/pull/40135#discussion_r1115332180 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -144,6 +144,17 @@ def test_drop_duplicates(self): message_parameters={"arg_name": "subset",

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115328475 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: Or do you have any suggestions

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115328475 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: Or do you have any suggestions

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115315752 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: And when I revert to add

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40135: [SPARK-42444][PYTHON] `DataFrame.drop` should handle multi columns properly

2023-02-22 Thread via GitHub
HyukjinKwon commented on code in PR #40135: URL: https://github.com/apache/spark/pull/40135#discussion_r1115324235 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -144,6 +144,17 @@ def test_drop_duplicates(self): message_parameters={"arg_name": "subset",

[GitHub] [spark] amaliujia closed pull request #38588: [SPARK-41086][SQL] Consolidate SecondArgumentXXX error to INVALID_PARAMETER_VALUE

2023-02-22 Thread via GitHub
amaliujia closed pull request #38588: [SPARK-41086][SQL] Consolidate SecondArgumentXXX error to INVALID_PARAMETER_VALUE URL: https://github.com/apache/spark/pull/38588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115318339 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: So I think `parquet-hadoop` is

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115315752 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: And when I revert to add

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115318339 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: So I think `parquet-hadoop` is

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115315752 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: And when I revert to add

[GitHub] [spark] cloud-fan commented on pull request #40135: [SPARK-42444][PYTHON] `DataFrame.drop` should handle multi columns properly

2023-02-22 Thread via GitHub
cloud-fan commented on PR #40135: URL: https://github.com/apache/spark/pull/40135#issuecomment-1441301397 which commit caused the regression? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40138: [SPARK-41793][SQL] Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #40138: URL: https://github.com/apache/spark/pull/40138#discussion_r1115306480 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala: ## @@ -474,4 +474,33 @@ class DataFrameWindowFramesSuite extends QueryTest with

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115305918 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: Did an experiment

[GitHub] [spark] huaxingao commented on a diff in pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
huaxingao commented on code in PR #40134: URL: https://github.com/apache/spark/pull/40134#discussion_r1115304895 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala: ## @@ -217,4 +217,25 @@ class DB2IntegrationSuite extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115303902 ## sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala: ## @@ -2316,6 +2316,18 @@ class InsertSuite extends DataSourceTest with

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115300391 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala: ## @@ -178,6 +178,15 @@ class DataSourceV2Strategy(session:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115298759 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3405,6 +3405,17 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115297770 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala: ## @@ -471,43 +473,63 @@ private[sql] object CatalogV2Util { /**

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115296490 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115295605 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115294387 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115293788 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] ulysses-you commented on pull request #40138: [SPARK-41793][SQL] Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread via GitHub
ulysses-you commented on PR #40138: URL: https://github.com/apache/spark/pull/40138#issuecomment-1441283164 cc @cloud-fan @tgravescs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115293400 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115292891 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115290604 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115290414 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GeneratedColumn.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] gengliangwang commented on a diff in pull request #40140: [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST

2023-02-22 Thread via GitHub
gengliangwang commented on code in PR #40140: URL: https://github.com/apache/spark/pull/40140#discussion_r1115289271 ## sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala: ## @@ -1106,6 +1106,16 @@ class InsertSuite extends DataSourceTest with

[GitHub] [spark] gengliangwang commented on a diff in pull request #40140: [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST

2023-02-22 Thread via GitHub
gengliangwang commented on code in PR #40140: URL: https://github.com/apache/spark/pull/40140#discussion_r1115289060 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -2357,21 +2357,43 @@ case class UpCast(child: Expression, target:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115287430 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Column.java: ## @@ -52,7 +58,17 @@ static Column create( String comment,

[GitHub] [spark] RunyaoChen commented on pull request #39855: [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST

2023-02-22 Thread via GitHub
RunyaoChen commented on PR #39855: URL: https://github.com/apache/spark/pull/39855#issuecomment-1441274484 > @RunyaoChen Could you backport this fix to branch-3.3 to fix [SPARK-42473](https://issues.apache.org/jira/browse/SPARK-42473)? Sure, here's the cherry-pick to branch-3.3:

[GitHub] [spark] alkis commented on a diff in pull request #40121: [SPARK-42528] Optimize PercentileHeap

2023-02-22 Thread via GitHub
alkis commented on code in PR #40121: URL: https://github.com/apache/spark/pull/40121#discussion_r1115274769 ## core/src/main/scala/org/apache/spark/util/collection/PercentileHeap.scala: ## @@ -20,97 +20,55 @@ package org.apache.spark.util.collection import

[GitHub] [spark] wangyum commented on pull request #40140: [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST

2023-02-22 Thread via GitHub
wangyum commented on PR #40140: URL: https://github.com/apache/spark/pull/40140#issuecomment-1441269735 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] alkis commented on pull request #40121: [SPARK-42528] Optimize PercentileHeap

2023-02-22 Thread via GitHub
alkis commented on PR #40121: URL: https://github.com/apache/spark/pull/40121#issuecomment-1441267376 > Can we minimize diff's to this file ? A large fraction is whitespace changes and due to the renames ... will take a look at the changes as well. Can you treat is a new

[GitHub] [spark] alkis commented on pull request #40121: [SPARK-42528] Optimize PercentileHeap

2023-02-22 Thread via GitHub
alkis commented on PR #40121: URL: https://github.com/apache/spark/pull/40121#issuecomment-1441264762 > Also given this is an optimization change - include benchmark to quantify the impact ? I did benchmarking live in a cluster. Profiles before show ~1% of scheduler time in

[GitHub] [spark] RunyaoChen opened a new pull request, #40140: [SPARK-42286][SQL] Fallback to previous codegen code path for complex expr with CAST

2023-02-22 Thread via GitHub
RunyaoChen opened a new pull request, #40140: URL: https://github.com/apache/spark/pull/40140 ### What changes were proposed in this pull request? This PR fixes the internal error `Child is not Cast or ExpressionProxy of Cast` for valid `CaseWhen` expr with `Cast`

[GitHub] [spark] alkis commented on a diff in pull request #40121: [SPARK-42528] Optimize PercentileHeap

2023-02-22 Thread via GitHub
alkis commented on code in PR #40121: URL: https://github.com/apache/spark/pull/40121#discussion_r1115277253 ## core/src/main/scala/org/apache/spark/util/collection/PercentileHeap.scala: ## @@ -20,97 +20,55 @@ package org.apache.spark.util.collection import

[GitHub] [spark] alkis commented on a diff in pull request #40121: [SPARK-42528] Optimize PercentileHeap

2023-02-22 Thread via GitHub
alkis commented on code in PR #40121: URL: https://github.com/apache/spark/pull/40121#discussion_r1115277253 ## core/src/main/scala/org/apache/spark/util/collection/PercentileHeap.scala: ## @@ -20,97 +20,55 @@ package org.apache.spark.util.collection import

[GitHub] [spark] alkis commented on a diff in pull request #40121: [SPARK-42528] Optimize PercentileHeap

2023-02-22 Thread via GitHub
alkis commented on code in PR #40121: URL: https://github.com/apache/spark/pull/40121#discussion_r1115274769 ## core/src/main/scala/org/apache/spark/util/collection/PercentileHeap.scala: ## @@ -20,97 +20,55 @@ package org.apache.spark.util.collection import

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115273797 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: Even if I make `parquet-hadoop`

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115273797 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: Yeah, even if I make

[GitHub] [spark] cloud-fan closed pull request #40073: [SPARK-42484] [SQL] UnsafeRowUtils better error message

2023-02-22 Thread via GitHub
cloud-fan closed pull request #40073: [SPARK-42484] [SQL] UnsafeRowUtils better error message URL: https://github.com/apache/spark/pull/40073 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #40073: [SPARK-42484] [SQL] UnsafeRowUtils better error message

2023-02-22 Thread via GitHub
cloud-fan commented on PR #40073: URL: https://github.com/apache/spark/pull/40073#issuecomment-1441255439 thanks, merging to master/3.4 (error message improvement) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] huaxingao opened a new pull request, #40139: [SPARK-39859][SQL][FOLLOWUP] Support v2 DESCRIBE TABLE EXTENDED for columns

2023-02-22 Thread via GitHub
huaxingao opened a new pull request, #40139: URL: https://github.com/apache/spark/pull/40139 ### What changes were proposed in this pull request? get ColStats in `DescribeColumnExec` when `isExtended` is true ### Why are the changes needed? To make code cleaner

[GitHub] [spark] xinrong-meng commented on pull request #40135: [SPARK-42444][PYTHON] `DataFrame.drop` should handle multi columns properly

2023-02-22 Thread via GitHub
xinrong-meng commented on PR #40135: URL: https://github.com/apache/spark/pull/40135#issuecomment-1441240940 Shall we add an example to **Does this PR introduce any user-facing change?** in the PR description? Like ```py >>> df3.show() +---++--++

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115256951 ## connector/connect/server/pom.xml: ## @@ -199,6 +199,11 @@ ${tomcat.annotations.api.version} provided + + org.apache.parquet +

[GitHub] [spark] LuciferYang commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115253586 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: Good question, when I add this

[GitHub] [spark] allisonport-db commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2023-02-22 Thread via GitHub
allisonport-db commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1115242394 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Column.java: ## @@ -82,6 +98,15 @@ static Column create( @Nullable ColumnDefaultValue

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-02-22 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1115240143 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/classification/Classifier.scala: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] sadikovi commented on pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
sadikovi commented on PR #40134: URL: https://github.com/apache/spark/pull/40134#issuecomment-1441208481 @dongjoon-hyun I have addressed the comment, could you review again please? Thank you. Also, do you know whom I can ping on this PR with the regard to DB2 SQL semantics? --

[GitHub] [spark] sadikovi commented on a diff in pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
sadikovi commented on code in PR #40134: URL: https://github.com/apache/spark/pull/40134#discussion_r1115227618 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala: ## @@ -217,4 +217,26 @@ class DB2IntegrationSuite extends

[GitHub] [spark] sadikovi commented on a diff in pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
sadikovi commented on code in PR #40134: URL: https://github.com/apache/spark/pull/40134#discussion_r1115227694 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala: ## @@ -217,4 +217,26 @@ class DB2IntegrationSuite extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
dongjoon-hyun commented on code in PR #40134: URL: https://github.com/apache/spark/pull/40134#discussion_r1115217759 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala: ## @@ -217,4 +217,26 @@ class DB2IntegrationSuite

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
dongjoon-hyun commented on code in PR #40134: URL: https://github.com/apache/spark/pull/40134#discussion_r1115217372 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala: ## @@ -160,4 +160,8 @@ private object DB2Dialect extends JdbcDialect { s"DROP

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
dongjoon-hyun commented on code in PR #40134: URL: https://github.com/apache/spark/pull/40134#discussion_r1115217088 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala: ## @@ -217,4 +217,26 @@ class DB2IntegrationSuite

[GitHub] [spark] hvanhovell commented on a diff in pull request #40133: [SPARK-42533][CONNECT][Scala] Add ssl for Scala client

2023-02-22 Thread via GitHub
hvanhovell commented on code in PR #40133: URL: https://github.com/apache/spark/pull/40133#discussion_r1115216626 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala: ## @@ -189,9 +245,54 @@ object SparkConnectClient {

[GitHub] [spark] mridulm commented on pull request #40064: [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-02-22 Thread via GitHub
mridulm commented on PR #40064: URL: https://github.com/apache/spark/pull/40064#issuecomment-1441190260 @Yikf Agree - we only specify two parts for the `JobID` - the `String jtIdentifier` and `int id`. We can persist those in the class - and make jobId a `transient lazy val` which

[GitHub] [spark] hvanhovell commented on a diff in pull request #40133: [SPARK-42533][CONNECT][Scala] Add ssl for Scala client

2023-02-22 Thread via GitHub
hvanhovell commented on code in PR #40133: URL: https://github.com/apache/spark/pull/40133#discussion_r1115216249 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala: ## @@ -158,13 +214,14 @@ object SparkConnectClient {

[GitHub] [spark] hvanhovell commented on a diff in pull request #40133: [SPARK-42533][CONNECT][Scala] Add ssl for Scala client

2023-02-22 Thread via GitHub
hvanhovell commented on code in PR #40133: URL: https://github.com/apache/spark/pull/40133#discussion_r1115215682 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala: ## @@ -117,6 +126,53 @@ object SparkConnectClient {

[GitHub] [spark] ulysses-you opened a new pull request, #40138: [SPARK-41793][SQL] Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread via GitHub
ulysses-you opened a new pull request, #40138: URL: https://github.com/apache/spark/pull/40138 ### What changes were proposed in this pull request? Use `DecimalAddNoOverflowCheck` instead of `Add` to craete bound ordering for window range frame ### Why are the

[GitHub] [spark] hvanhovell closed pull request #40129: [SPARK-42529][CONNECT] Support Cube and Rollup in Scala client

2023-02-22 Thread via GitHub
hvanhovell closed pull request #40129: [SPARK-42529][CONNECT] Support Cube and Rollup in Scala client URL: https://github.com/apache/spark/pull/40129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] hvanhovell commented on pull request #40129: [SPARK-42529][CONNECT] Support Cube and Rollup in Scala client

2023-02-22 Thread via GitHub
hvanhovell commented on PR #40129: URL: https://github.com/apache/spark/pull/40129#issuecomment-1441187112 merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on pull request #40137: [SPARK-42049][SQL][FOLLOWUP] Always filter away invalid ordering/partitioning

2023-02-22 Thread via GitHub
cloud-fan commented on PR #40137: URL: https://github.com/apache/spark/pull/40137#issuecomment-1441186992 cc @peter-toth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan opened a new pull request, #40137: [SPARK-42049][SQL][FOLLOWUP] Always filter away invalid ordering/partitioning

2023-02-22 Thread via GitHub
cloud-fan opened a new pull request, #40137: URL: https://github.com/apache/spark/pull/40137 ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/37525 . When the project list as aliases, we go to the

[GitHub] [spark] hvanhovell commented on a diff in pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
hvanhovell commented on code in PR #40136: URL: https://github.com/apache/spark/pull/40136#discussion_r1115214435 ## connector/connect/client/jvm/pom.xml: ## @@ -125,6 +125,11 @@ ${mima.version} test + Review Comment: I have trouble understanding how

[GitHub] [spark] LuciferYang commented on pull request #40136: [SPARK-42515][BUILD][CONNECT][TESTS] Make `write table` in `ClientE2ETestSuite` sbt local test pass

2023-02-22 Thread via GitHub
LuciferYang commented on PR #40136: URL: https://github.com/apache/spark/pull/40136#issuecomment-1441185226 cc @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] Yikf commented on pull request #40064: [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-02-22 Thread via GitHub
Yikf commented on PR #40064: URL: https://github.com/apache/spark/pull/40064#issuecomment-1441179787 @mridulm Thanks your review, this is a nice question for me, `JobId` maybe is different when each time the class is deserialized. How about this idea that

[GitHub] [spark] sadikovi commented on a diff in pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
sadikovi commented on code in PR #40134: URL: https://github.com/apache/spark/pull/40134#discussion_r1115208845 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala: ## @@ -1028,6 +1028,19 @@ class JDBCSuite extends QueryTest with SharedSparkSession {

[GitHub] [spark] sadikovi commented on a diff in pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
sadikovi commented on code in PR #40134: URL: https://github.com/apache/spark/pull/40134#discussion_r1115208845 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala: ## @@ -1028,6 +1028,19 @@ class JDBCSuite extends QueryTest with SharedSparkSession {

[GitHub] [spark] mridulm commented on pull request #40064: [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-02-22 Thread via GitHub
mridulm commented on PR #40064: URL: https://github.com/apache/spark/pull/40064#issuecomment-1441168996 I have not followed the changes in this part of the code too much in a while - but this specific PR will result in a different `jobId` each time the class is deserialized - I would

[GitHub] [spark] LuciferYang opened a new pull request, #40136: [SPARK-42515][BUILD][TESTS] Make `write table` in `ClientE2ETestSuite` local test pass

2023-02-22 Thread via GitHub
LuciferYang opened a new pull request, #40136: URL: https://github.com/apache/spark/pull/40136 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] cloud-fan commented on a diff in pull request #40121: [SPARK-42528] Optimize PercentileHeap

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #40121: URL: https://github.com/apache/spark/pull/40121#discussion_r1115200882 ## core/src/main/scala/org/apache/spark/util/collection/PercentileHeap.scala: ## @@ -20,97 +20,55 @@ package org.apache.spark.util.collection import

[GitHub] [spark] cloud-fan commented on a diff in pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
cloud-fan commented on code in PR #40134: URL: https://github.com/apache/spark/pull/40134#discussion_r1115199857 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala: ## @@ -1028,6 +1028,19 @@ class JDBCSuite extends QueryTest with SharedSparkSession {

[GitHub] [spark] zhengruifeng commented on pull request #40135: [SPARK-42444][PYTHON] `DataFrame.drop` should handle multi columns properly

2023-02-22 Thread via GitHub
zhengruifeng commented on PR #40135: URL: https://github.com/apache/spark/pull/40135#issuecomment-1441159415 cc @HyukjinKwon @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng opened a new pull request, #40135: [SPARK-42444][PYTHON] `DataFrame.drop` should handle multi columns properly

2023-02-22 Thread via GitHub
zhengruifeng opened a new pull request, #40135: URL: https://github.com/apache/spark/pull/40135 ### What changes were proposed in this pull request? Existing implementation always convert inputs (maybe column or column name) to columns, this cause `AMBIGUOUS_REFERENCE` issue since there

[GitHub] [spark] WeichenXu123 commented on pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-02-22 Thread via GitHub
WeichenXu123 commented on PR #40097: URL: https://github.com/apache/spark/pull/40097#issuecomment-1441155203 > This PR also copies following testsuites to spark-mllib-common: > 1, org.apache.spark.ml.attribute.* > 2, org.apache.spark.ml.linalg.* except: > >

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-02-22 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1115193417 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/ml/Pipeline.scala: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-02-22 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1115192771 ## mllib-common/src/test/scala/org/apache/spark/ml/attribute/AttributeSuite.scala: ## @@ -0,0 +1,242 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-02-22 Thread via GitHub
WeichenXu123 commented on code in PR #40097: URL: https://github.com/apache/spark/pull/40097#discussion_r1115192771 ## mllib-common/src/test/scala/org/apache/spark/ml/attribute/AttributeSuite.scala: ## @@ -0,0 +1,242 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] beliefer commented on a diff in pull request #39954: [SPARK-42289][SQL] DS V2 pushdown could let JDBC dialect decide to push down offset and limit

2023-02-22 Thread via GitHub
beliefer commented on code in PR #39954: URL: https://github.com/apache/spark/pull/39954#discussion_r1115191715 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala: ## @@ -126,24 +126,23 @@ case class JDBCScanBuilder(

[GitHub] [spark] sadikovi commented on pull request #40134: [SPARK-42534][SQL] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
sadikovi commented on PR #40134: URL: https://github.com/apache/spark/pull/40134#issuecomment-1441145746 cc @dongjoon-hyun @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum commented on pull request #40115: [SPARK-42525][CORE]collapse two adjacent windows with the same partition/order in subquery

2023-02-22 Thread via GitHub
wangyum commented on PR #40115: URL: https://github.com/apache/spark/pull/40115#issuecomment-1441144948 @zml1206 Could you update the PR title to `[SPARK-42525][SQL] Collapse ...`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] wangyum commented on a diff in pull request #40115: [SPARK-42525][CORE]collapse two adjacent windows with the same partition/order in subquery

2023-02-22 Thread via GitHub
wangyum commented on code in PR #40115: URL: https://github.com/apache/spark/pull/40115#discussion_r1115183539 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala: ## @@ -532,10 +532,15 @@ class DataFrameWindowFunctionsSuite extends QueryTest

[GitHub] [spark] amaliujia commented on a diff in pull request #40129: [SPARK-42529][CONNECT] Support Cube and Rollup in Scala client

2023-02-22 Thread via GitHub
amaliujia commented on code in PR #40129: URL: https://github.com/apache/spark/pull/40129#discussion_r1115180270 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -37,16 +37,25 @@ import org.apache.spark.connect.proto */

[GitHub] [spark] Yikf commented on pull request #40064: [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-02-22 Thread via GitHub
Yikf commented on PR #40064: URL: https://github.com/apache/spark/pull/40064#issuecomment-1441134396 kindly ping @cloud-fan , @boneanxs Any suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on pull request #40120: [SPARK-42527][CONNECT] Scala Client add Window functions

2023-02-22 Thread via GitHub
LuciferYang commented on PR #40120: URL: https://github.com/apache/spark/pull/40120#issuecomment-1441131900 Thanks @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #40120: [SPARK-42527][CONNECT] Scala Client add Window functions

2023-02-22 Thread via GitHub
LuciferYang commented on code in PR #40120: URL: https://github.com/apache/spark/pull/40120#discussion_r1115177855 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -129,7 +132,7 @@ object functions { case v: Array[Byte] =>

[GitHub] [spark] sadikovi opened a new pull request, #40134: [SPARK-42534] Fix DB2Dialect Limit clause

2023-02-22 Thread via GitHub
sadikovi opened a new pull request, #40134: URL: https://github.com/apache/spark/pull/40134 ### What changes were proposed in this pull request? The PR fixes DB2 Limit clause syntax. Although DB2 supports LIMIT keyword, it seems that this support varies across databases

[GitHub] [spark] hvanhovell commented on a diff in pull request #40129: [SPARK-42529][CONNECT] Support Cube and Rollup in Scala client

2023-02-22 Thread via GitHub
hvanhovell commented on code in PR #40129: URL: https://github.com/apache/spark/pull/40129#discussion_r1115163152 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -225,3 +234,28 @@ class RelationalGroupedDataset

[GitHub] [spark] hvanhovell commented on a diff in pull request #40129: [SPARK-42529][CONNECT] Support Cube and Rollup in Scala client

2023-02-22 Thread via GitHub
hvanhovell commented on code in PR #40129: URL: https://github.com/apache/spark/pull/40129#discussion_r1115162635 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -37,16 +37,25 @@ import org.apache.spark.connect.proto

[GitHub] [spark] dongjoon-hyun closed pull request #40127: [SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread via GitHub
dongjoon-hyun closed pull request #40127: [SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide URL: https://github.com/apache/spark/pull/40127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun commented on pull request #40127: [SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread via GitHub
dongjoon-hyun commented on PR #40127: URL: https://github.com/apache/spark/pull/40127#issuecomment-1441097534 Thank you, @HyukjinKwon . Merged to master/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #39995: [WIP][CONNECT] Initial runtime SQL configuration implementation

2023-02-22 Thread via GitHub
HyukjinKwon commented on PR #39995: URL: https://github.com/apache/spark/pull/39995#issuecomment-1441085969 cc @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] github-actions[bot] commented on pull request #37634: [SPARK-40199][SQL] Provide useful error when projecting a non-null column encounters null value

2023-02-22 Thread via GitHub
github-actions[bot] commented on PR #37634: URL: https://github.com/apache/spark/pull/37634#issuecomment-1441040591 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] zhenlineo commented on pull request #40133: [SPARK-42533][CONNECT][Scala] Add ssl for Scala client

2023-02-22 Thread via GitHub
zhenlineo commented on PR #40133: URL: https://github.com/apache/spark/pull/40133#issuecomment-1441018465 cc @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhenlineo opened a new pull request, #40133: [SPARK-42533][CONNECT][Scala] Add ssl for Scala client

2023-02-22 Thread via GitHub
zhenlineo opened a new pull request, #40133: URL: https://github.com/apache/spark/pull/40133 ### What changes were proposed in this pull request? Adding SSL encryption and access token support for Scala client ### Why are the changes needed? To support basic client side

[GitHub] [spark] wangyum commented on a diff in pull request #40115: [SPARK-42525][CORE]collapse two adjacent windows with the same partition/order in subquery

2023-02-22 Thread via GitHub
wangyum commented on code in PR #40115: URL: https://github.com/apache/spark/pull/40115#discussion_r1115112991 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3592,6 +3592,34 @@ class DataFrameSuite extends QueryTest val df =

[GitHub] [spark] wangyum commented on a diff in pull request #40115: [SPARK-42525][CORE]collapse two adjacent windows with the same partition/order in subquery

2023-02-22 Thread via GitHub
wangyum commented on code in PR #40115: URL: https://github.com/apache/spark/pull/40115#discussion_r1115111006 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3592,6 +3592,34 @@ class DataFrameSuite extends QueryTest val df =

[GitHub] [spark] dongjoon-hyun closed pull request #40132: [SPARK-42532][K8S][DOCS] Update YuniKorn docs with v1.2

2023-02-22 Thread via GitHub
dongjoon-hyun closed pull request #40132: [SPARK-42532][K8S][DOCS] Update YuniKorn docs with v1.2 URL: https://github.com/apache/spark/pull/40132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #40132: [SPARK-42532][K8S][DOCS] Update YuniKorn docs with v1.2

2023-02-22 Thread via GitHub
dongjoon-hyun commented on PR #40132: URL: https://github.com/apache/spark/pull/40132#issuecomment-1440969626 Thank you so much, @viirya . Merged to master/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

  1   2   3   >