[GitHub] [spark] MaxGekk closed pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-09 Thread via GitHub
MaxGekk closed pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR URL: https://github.com/apache/spark/pull/40236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhenlineo commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

2023-03-09 Thread via GitHub
zhenlineo commented on code in PR #40356: URL: https://github.com/apache/spark/pull/40356#discussion_r1131689057 ## connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -554,7 +554,8 @@ class SparkConnectProtoSuite

[GitHub] [spark] cloud-fan commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40321: URL: https://github.com/apache/spark/pull/40321#discussion_r1131870627 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala: ## @@ -281,6 +281,53 @@ class FileMetadataStructSuite extends

[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-09 Thread via GitHub
shrprasa commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1463227858 Gentle ping @holdenk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40321: URL: https://github.com/apache/spark/pull/40321#discussion_r1131939457 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1033,9 +1033,12 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40321: URL: https://github.com/apache/spark/pull/40321#discussion_r1131938647 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1033,9 +1033,12 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] beliefer opened a new pull request, #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub
beliefer opened a new pull request, #40359: URL: https://github.com/apache/spark/pull/40359 ### What changes were proposed in this pull request? Currently, the DS V2 pushdown framework pushed offset as `OFFSET n` in default and pushed it with limit as `LIMIT m OFFSET n`. But some

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131987375 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132005986 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect { if

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1132005993 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] AngersZhuuuu commented on pull request #40314: [SPARK-42698][CORE] SparkSubmit should also stop SparkContext when exit program in yarn mode and pass exitCode to AM side

2023-03-09 Thread via GitHub
AngersZh commented on PR #40314: URL: https://github.com/apache/spark/pull/40314#issuecomment-1463391491 @cloud-fan Seems this code https://github.com/apache/spark/pull/32283 first want to fix issue in k8s, then @dongjoon-hyun make it limit in k8s env. But this also can work for yarn

[GitHub] [spark] xinrong-meng commented on pull request #40350: [SPARK-42726][CONNECT][PYTHON] Implement `DataFrame.mapInArrow`

2023-03-09 Thread via GitHub
xinrong-meng commented on PR #40350: URL: https://github.com/apache/spark/pull/40350#issuecomment-1463193506 Thanks @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

2023-03-09 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40321: URL: https://github.com/apache/spark/pull/40321#discussion_r1131932553 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1033,9 +1033,12 @@ class Analyzer(override val

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

2023-03-09 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40321: URL: https://github.com/apache/spark/pull/40321#discussion_r1131931758 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1033,9 +1033,12 @@ class Analyzer(override val

[GitHub] [spark] zhenlineo opened a new pull request, #40358: [SPARK-42733][CONNECT][Followup] Write without path or table

2023-03-09 Thread via GitHub
zhenlineo opened a new pull request, #40358: URL: https://github.com/apache/spark/pull/40358 ### What changes were proposed in this pull request? Fixes `DataFrameWriter.save` to work without path or table parameter. Added support of jdbc method in the writer as it is one of the impl

[GitHub] [spark] amaliujia commented on a diff in pull request #40358: [SPARK-42733][CONNECT][Followup] Write without path or table

2023-03-09 Thread via GitHub
amaliujia commented on code in PR #40358: URL: https://github.com/apache/spark/pull/40358#discussion_r1131967870 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: ## @@ -345,6 +347,37 @@ final class DataFrameWriter[T] private[sql] (ds:

[GitHub] [spark] cloud-fan commented on pull request #40333: [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE

2023-03-09 Thread via GitHub
cloud-fan commented on PR #40333: URL: https://github.com/apache/spark/pull/40333#issuecomment-1463321220 GA passes, let me merge it back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan closed pull request #40333: [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE

2023-03-09 Thread via GitHub
cloud-fan closed pull request #40333: [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE URL: https://github.com/apache/spark/pull/40333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-09 Thread via GitHub
cloud-fan commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1463337176 I think the test is easy to fix. It wants to test the aggregate function result, but not the generated alias, so we just change the testing query to add alias explicitly. ``` val

[GitHub] [spark] cloud-fan commented on pull request #40314: [SPARK-42698][CORE] SparkSubmit should also stop SparkContext when exit program in yarn mode and pass exitCode to AM side

2023-03-09 Thread via GitHub
cloud-fan commented on PR #40314: URL: https://github.com/apache/spark/pull/40314#issuecomment-1463378857 @dongjoon-hyun do you have more context about https://github.com/apache/spark/pull/33403? Why do we limit the stopping spark context behavior to k8s only? -- This is an automated

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1132037463 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,14 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1132037463 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,14 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1132040956 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala: ## @@ -166,4 +166,14 @@ case class InMemoryTableScanExec(

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1132040751 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala: ## @@ -166,4 +166,14 @@ case class InMemoryTableScanExec(

[GitHub] [spark] cloud-fan commented on pull request #40333: [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE

2023-03-09 Thread via GitHub
cloud-fan commented on PR #40333: URL: https://github.com/apache/spark/pull/40333#issuecomment-1463145544 maybe there is a conflict right after my last commit, let me rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40357: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

2023-03-09 Thread via GitHub
HyukjinKwon commented on code in PR #40357: URL: https://github.com/apache/spark/pull/40357#discussion_r1131893994 ## dev/create-release/release-tag.sh: ## @@ -122,6 +122,12 @@ if ! is_dry_run; then git push origin $RELEASE_TAG if [[ $RELEASE_VERSION != *"preview"* ]];

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131893211 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1073,6 +1074,12 @@ class SparkConnectPlanner(val

[GitHub] [spark] chong0929 commented on a diff in pull request #40341: [SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

2023-03-09 Thread via GitHub
chong0929 commented on code in PR #40341: URL: https://github.com/apache/spark/pull/40341#discussion_r1131896943 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java: ## @@ -204,7 +204,12 @@ public void initBatch( * by copying

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131919854 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2065,6 +2065,43 @@ class PlanGenerationTestSuite

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131975162 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -250,3 +252,44 @@ case class BroadcastQueryStageExec( override

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131974200 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131980520 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala: ## @@ -275,10 +272,22 @@ case class CachedRDDBuilder(

[GitHub] [spark] wangyum opened a new pull request, #40360: [SPARK-42741][SQL] Do not unwrap casts in binary comparison when literal is null

2023-03-09 Thread via GitHub
wangyum opened a new pull request, #40360: URL: https://github.com/apache/spark/pull/40360 ### What changes were proposed in this pull request? This PR makes `UnwrapCastInBinaryComparison` not to unwrap casts in binary comparison when literal is null. ### Why are the changes

[GitHub] [spark] cloud-fan commented on pull request #40314: [SPARK-42698][CORE] SparkSubmit should pass exitCode to AM side for yarn mode

2023-03-09 Thread via GitHub
cloud-fan commented on PR #40314: URL: https://github.com/apache/spark/pull/40314#issuecomment-1463331316 This seems to be a revert of https://github.com/apache/spark/pull/33403 as now we stop SparkContext in YARN environment as well. We should justify it in the PR description. This is not

[GitHub] [spark] itholic commented on pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-09 Thread via GitHub
itholic commented on PR #40282: URL: https://github.com/apache/spark/pull/40282#issuecomment-1463083039 Documentation for SQL side is get merged from https://github.com/apache/spark/pull/40336. Note that Python side are simpler compared to SQL side because we do not have SQLSTATE,

[GitHub] [spark] chong0929 commented on a diff in pull request #40341: [SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

2023-03-09 Thread via GitHub
chong0929 commented on code in PR #40341: URL: https://github.com/apache/spark/pull/40341#discussion_r1131900349 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java: ## @@ -204,7 +204,12 @@ public void initBatch( * by copying

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40357: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

2023-03-09 Thread via GitHub
xinrong-meng commented on code in PR #40357: URL: https://github.com/apache/spark/pull/40357#discussion_r1131900698 ## dev/create-release/release-tag.sh: ## @@ -122,6 +122,12 @@ if ! is_dry_run; then git push origin $RELEASE_TAG if [[ $RELEASE_VERSION != *"preview"* ]];

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131904105 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala: ## @@ -176,4 +176,31 @@ class DataFrameStatSuite extends

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-09 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1131921529 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] zhenlineo commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

2023-03-09 Thread via GitHub
zhenlineo commented on code in PR #40356: URL: https://github.com/apache/spark/pull/40356#discussion_r1131922084 ## python/pyspark/sql/tests/test_datasources.py: ## @@ -192,6 +193,23 @@ def test_ignore_column_of_all_nulls(self): finally:

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-09 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1131921529 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

2023-03-09 Thread via GitHub
zhenlineo commented on PR #40274: URL: https://github.com/apache/spark/pull/40274#issuecomment-1463235387 > seems `SimpleSparkConnectService` startup failed, the error message is > > ``` > Error: Missing application resource. > > Usage: spark-submit [options] [app

[GitHub] [spark] dongjoon-hyun commented on pull request #40357: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

2023-03-09 Thread via GitHub
dongjoon-hyun commented on PR #40357: URL: https://github.com/apache/spark/pull/40357#issuecomment-1463260567 Thank you, @xinrong-meng and @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] ueshin commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

2023-03-09 Thread via GitHub
ueshin commented on code in PR #40356: URL: https://github.com/apache/spark/pull/40356#discussion_r1131958688 ## python/pyspark/sql/tests/test_datasources.py: ## @@ -192,6 +193,23 @@ def test_ignore_column_of_all_nulls(self): finally: shutil.rmtree(path)

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131992558 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -250,3 +252,44 @@ case class BroadcastQueryStageExec( override

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1132032012 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1132039721 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala: ## @@ -275,10 +272,19 @@ case class CachedRDDBuilder(

[GitHub] [spark] itholic commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-09 Thread via GitHub
itholic commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1127483352 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131890754 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -195,6 +197,17 @@ message Expression { DataType elementType = 1;

[GitHub] [spark] xinrong-meng commented on pull request #40357: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

2023-03-09 Thread via GitHub
xinrong-meng commented on PR #40357: URL: https://github.com/apache/spark/pull/40357#issuecomment-1463187698 Merged to master and branch-3.4, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] xinrong-meng closed pull request #40357: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

2023-03-09 Thread via GitHub
xinrong-meng closed pull request #40357: [SPARK-42739][BUILD] Ensure release tag to be pushed to release branch URL: https://github.com/apache/spark/pull/40357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

2023-03-09 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40321: URL: https://github.com/apache/spark/pull/40321#discussion_r1131931758 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1033,9 +1033,12 @@ class Analyzer(override val

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

2023-03-09 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40321: URL: https://github.com/apache/spark/pull/40321#discussion_r1131931758 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1033,9 +1033,12 @@ class Analyzer(override val

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131976928 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala: ## @@ -275,10 +272,22 @@ case class CachedRDDBuilder(

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131990280 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala: ## @@ -275,10 +272,22 @@ case class CachedRDDBuilder(

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131995849 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1131119650 ## sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala: ## @@ -166,4 +170,32 @@ case class InMemoryTableScanExec(

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131995398 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -584,6 +585,101 @@ final class DataFrameStatFunctions

[GitHub] [spark] thousandhu opened a new pull request, #40361: [SPARK_42742]access apiserver by pod env

2023-03-09 Thread via GitHub
thousandhu opened a new pull request, #40361: URL: https://github.com/apache/spark/pull/40361 ### What changes were proposed in this pull request? When start spark on k8s,driver pod use spark.kubernetes.driver.master to get apiserver address. This config us

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1132027166 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1132036803 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -345,7 +350,7 @@ case class AdaptiveSparkPlanExec( //

[GitHub] [spark] cloud-fan commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1132036220 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -520,6 +526,14 @@ case class AdaptiveSparkPlanExec(

[GitHub] [spark] ulysses-you commented on a diff in pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-09 Thread via GitHub
ulysses-you commented on code in PR #39624: URL: https://github.com/apache/spark/pull/39624#discussion_r1130968837 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -220,10 +221,28 @@ case class AdaptiveSparkPlanExec( }

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131897415 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1073,6 +1074,12 @@ class SparkConnectPlanner(val

[GitHub] [spark] LuciferYang commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
LuciferYang commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131904105 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala: ## @@ -176,4 +176,31 @@ class DataFrameStatSuite extends

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-09 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1131921529 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

2023-03-09 Thread via GitHub
zhenlineo commented on PR #40274: URL: https://github.com/apache/spark/pull/40274#issuecomment-1463236560 @hvanhovell Want to keep this or shall we skip? It helps a bit when not knowing `build/sbt -Pconnect -Phive package` before running the IT. -- This is an automated message from the

[GitHub] [spark] wangyum commented on a diff in pull request #40360: [SPARK-42741][SQL] Do not unwrap casts in binary comparison when literal is null

2023-03-09 Thread via GitHub
wangyum commented on code in PR #40360: URL: https://github.com/apache/spark/pull/40360#discussion_r1132000965 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala: ## @@ -192,7 +192,7 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132007081 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect { if

[GitHub] [spark] thousandhu commented on pull request #40361: [SPARK_42742]access apiserver by pod env

2023-03-09 Thread via GitHub
thousandhu commented on PR #40361: URL: https://github.com/apache/spark/pull/40361#issuecomment-1463380353 I've enabled GitHub Actions in your forked repository. How to rerun the build check failed above? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] AngersZhuuuu commented on pull request #40314: [SPARK-42698][CORE] SparkSubmit should also stop SparkContext when exit program in yarn mode and pass exitCode to AM side

2023-03-09 Thread via GitHub
AngersZh commented on PR #40314: URL: https://github.com/apache/spark/pull/40314#issuecomment-1463389076 Failed UT should not related to this pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #40333: [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE

2023-03-09 Thread via GitHub
HyukjinKwon commented on PR #40333: URL: https://github.com/apache/spark/pull/40333#issuecomment-1463085558 Seems like the compliation didn't pass. Let me just quickly revert this and reopen. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] cloud-fan commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40321: URL: https://github.com/apache/spark/pull/40321#discussion_r1131869082 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1033,9 +1033,12 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] zhengruifeng commented on pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub
zhengruifeng commented on PR #40349: URL: https://github.com/apache/spark/pull/40349#issuecomment-1463120914 merged into master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params

2023-03-09 Thread via GitHub
zhengruifeng closed pull request #40349: [SPARK-42725][CONNECT][PYTHON] Make LiteralExpression support array params URL: https://github.com/apache/spark/pull/40349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] wangyum commented on pull request #40266: [SPARK-42660][SQL] Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)

2023-03-09 Thread via GitHub
wangyum commented on PR #40266: URL: https://github.com/apache/spark/pull/40266#issuecomment-1463196395 I had a change like this before: https://github.com/apache/spark/pull/22778. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-09 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1131922264 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -154,46 +215,46 @@ object

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-09 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1463230011 > Hm, I just don't see the logic in that. It isn't how SQL works either, as far as I understand. Here's maybe another example, imagine a DataFrame defined by `SELECT 3 as id, 3 as ID`.

[GitHub] [spark] ueshin commented on pull request #40276: [SPARK-42630][CONNECT][PYTHON] Implement data type string parser

2023-03-09 Thread via GitHub
ueshin commented on PR #40276: URL: https://github.com/apache/spark/pull/40276#issuecomment-1463286761 Close this in favor of #40260. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] ueshin closed pull request #40276: [SPARK-42630][CONNECT][PYTHON] Implement data type string parser

2023-03-09 Thread via GitHub
ueshin closed pull request #40276: [SPARK-42630][CONNECT][PYTHON] Implement data type string parser URL: https://github.com/apache/spark/pull/40276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] AngersZhuuuu commented on pull request #40314: [SPARK-42698][CORE] SparkSubmit should also stop SparkContext when exit program in yarn mode and pass exitCode to AM side

2023-03-09 Thread via GitHub
AngersZh commented on PR #40314: URL: https://github.com/apache/spark/pull/40314#issuecomment-1463340552 > This seems to be a revert of #33403 as now we stop SparkContext in YARN environment as well. We should justify it in the PR description. This is not simply passing the exitCode.

[GitHub] [spark] beliefer commented on pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub
beliefer commented on PR #40359: URL: https://github.com/apache/spark/pull/40359#issuecomment-1463340665 ping @cloud-fan cc @sadikovi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132005699 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala: ## @@ -181,19 +181,35 @@ private case object OracleDialect extends JdbcDialect { if

[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132005206 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala: ## @@ -291,4 +291,22 @@ private case object MySQLDialect extends JdbcDialect with

[GitHub] [spark] cloud-fan commented on a diff in pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-09 Thread via GitHub
cloud-fan commented on code in PR #40359: URL: https://github.com/apache/spark/pull/40359#discussion_r1132004161 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -410,6 +410,15 @@ private[v2] trait V2JDBCTest extends

[GitHub] [spark] xinrong-meng closed pull request #40357: [WIP][SPARK-42739][BUILD] Ensure release tag to be pushed to release branch

2023-03-09 Thread via GitHub
xinrong-meng closed pull request #40357: [WIP][SPARK-42739][BUILD] Ensure release tag to be pushed to release branch URL: https://github.com/apache/spark/pull/40357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon closed pull request #40350: [SPARK-42726][CONNECT][PYTHON] Implement `DataFrame.mapInArrow`

2023-03-09 Thread via GitHub
HyukjinKwon closed pull request #40350: [SPARK-42726][CONNECT][PYTHON] Implement `DataFrame.mapInArrow` URL: https://github.com/apache/spark/pull/40350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #40350: [SPARK-42726][CONNECT][PYTHON] Implement `DataFrame.mapInArrow`

2023-03-09 Thread via GitHub
HyukjinKwon commented on PR #40350: URL: https://github.com/apache/spark/pull/40350#issuecomment-1463058011 The test failure seems unrelated. Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40324: [WIP][SPARK-42496][CONNECT][DOCS] Adding Spark Connect to the Spark 3.4 documentation

2023-03-09 Thread via GitHub
HyukjinKwon commented on code in PR #40324: URL: https://github.com/apache/spark/pull/40324#discussion_r1131838714 ## docs/spark-connect-overview.md: ## @@ -0,0 +1,244 @@ +--- +layout: global +title: Spark Connect Overview +license: | + Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on pull request #40333: [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE

2023-03-09 Thread via GitHub
cloud-fan commented on PR #40333: URL: https://github.com/apache/spark/pull/40333#issuecomment-1463075399 The failed `BasicSchedulerIntegrationSuite` is not related to this PR, I'm merging it to master/3.4, thanks for the review! -- This is an automated message from the Apache Git

[GitHub] [spark] cloud-fan commented on pull request #40333: [SPARK-42702][SPARK-42623][SQL] Support parameterized query in subquery and CTE

2023-03-09 Thread via GitHub
cloud-fan commented on PR #40333: URL: https://github.com/apache/spark/pull/40333#issuecomment-1463076663 This is a bug fix of a new feature in 3.4, so I won't call it a release blocker. I've set the fixed version to 3.4.0, if rc3 passes, I'll change it to 3.4.1. -- This is an automated

[GitHub] [spark] hvanhovell commented on pull request #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub
hvanhovell commented on PR #40353: URL: https://github.com/apache/spark/pull/40353#issuecomment-1462597849 @WeichenXu123 in what case won't Spark Connect ML have access to the session? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] rangadi commented on a diff in pull request #40342: [SPARK-42721][CONNECT] RPC logging interceptor

2023-03-09 Thread via GitHub
rangadi commented on code in PR #40342: URL: https://github.com/apache/spark/pull/40342#discussion_r1131461102 ## connector/connect/server/pom.xml: ## @@ -155,6 +155,12 @@ ${protobuf.version} compile + + com.google.protobuf + protobuf-java-util

[GitHub] [spark] ueshin commented on a diff in pull request #40356: [SPARK-42733][CONNECT][PYTHON] Fix DataFrameWriter.save to work without path parameter

2023-03-09 Thread via GitHub
ueshin commented on code in PR #40356: URL: https://github.com/apache/spark/pull/40356#discussion_r1131677278 ## connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -554,7 +554,8 @@ class SparkConnectProtoSuite

[GitHub] [spark] HyukjinKwon commented on pull request #40302: [SPARK-42686][CORE] Defer formatting for debug messages in TaskMemoryManager

2023-03-09 Thread via GitHub
HyukjinKwon commented on PR #40302: URL: https://github.com/apache/spark/pull/40302#issuecomment-1463061643 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] WeichenXu123 commented on pull request #40353: [SPARK-42732][PYSPARK][CONNECT] Support spark connect session getActiveSession method

2023-03-09 Thread via GitHub
WeichenXu123 commented on PR #40353: URL: https://github.com/apache/spark/pull/40353#issuecomment-1463060514 > @WeichenXu123 in what case won't Spark Connect ML have access to the session? For some APIs, like `estimator.fit(dataset)`, `model.transform(dataset)`, we can get session

[GitHub] [spark] holdenk commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2023-03-09 Thread via GitHub
holdenk commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1462499267 Exit code 137 generally refers to out of memory at the container level, can you increase the overhead and see if it still occurs for you? -- This is an automated message from the

[GitHub] [spark] mridulm commented on pull request #40339: [SPARK-42719][CORE] `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-09 Thread via GitHub
mridulm commented on PR #40339: URL: https://github.com/apache/spark/pull/40339#issuecomment-1462498084 Merged to master. Thanks for working on this @jerqi ! Thanks for the reviews @cloud-fan, @LuciferYang, @advancedxy :-) -- This is an automated message from the Apache Git Service.

[GitHub] [spark] holdenk commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2023-03-09 Thread via GitHub
holdenk commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1462501095 Or do you have a repro? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on a diff in pull request #40352: [WIP][SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`

2023-03-09 Thread via GitHub
hvanhovell commented on code in PR #40352: URL: https://github.com/apache/spark/pull/40352#discussion_r1131455328 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1073,6 +1074,12 @@ class SparkConnectPlanner(val

  1   2   >