[GitHub] [spark] mridulm commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-16 Thread via GitHub
mridulm commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1473171295 So this is an interesting coincidence, I literally encountered a production job which seems to be hitting this exact same issue :-) I was in the process of creating a test case, but my

[GitHub] [spark] LuciferYang commented on pull request #40445: [SPARK-42814][BUILD] Upgrade maven plugins to latest versions

2023-03-16 Thread via GitHub
LuciferYang commented on PR #40445: URL: https://github.com/apache/spark/pull/40445#issuecomment-1473130566 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #40418: [SPARK-42790][SQL] Abstract the excluded method for better test for JDBC docker tests.

2023-03-16 Thread via GitHub
cloud-fan commented on code in PR #40418: URL: https://github.com/apache/spark/pull/40418#discussion_r1139727723 ## core/src/test/scala/org/apache/spark/SparkFunSuite.scala: ## @@ -137,6 +138,19 @@ abstract class SparkFunSuite java.nio.file.Paths.get(sparkHome, first +:

[GitHub] [spark] viirya commented on a diff in pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-16 Thread via GitHub
viirya commented on code in PR #40421: URL: https://github.com/apache/spark/pull/40421#discussion_r1139705437 ## core/src/main/resources/error/error-classes.json: ## @@ -1063,6 +1063,28 @@ ], "sqlState" : "42903" }, + "INVALID_WRITE_DISTRIBUTION" : { +

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-16 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1139690510 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala: ## @@ -134,46 +192,46 @@ object

[GitHub] [spark] cloud-fan commented on pull request #40457: [SPARK-42823][SQL] `spark-sql` shell supports multipart namespaces for initialization

2023-03-16 Thread via GitHub
cloud-fan commented on PR #40457: URL: https://github.com/apache/spark/pull/40457#issuecomment-1473084358 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-16 Thread via GitHub
aokolnychyi commented on code in PR #40421: URL: https://github.com/apache/spark/pull/40421#discussion_r1139671628 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala: ## @@ -36,6 +36,7 @@ object

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40361: [SPARK_42742]access apiserver by pod env

2023-03-16 Thread via GitHub
dongjoon-hyun commented on code in PR #40361: URL: https://github.com/apache/spark/pull/40361#discussion_r1139668571 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala: ## @@ -55,6 +55,14 @@ private[spark] object Config extends Logging

[GitHub] [spark] cloud-fan commented on a diff in pull request #40456: [SPARK-42720][PS][SQL] Uses expression for distributed-sequence default index instead of plan

2023-03-16 Thread via GitHub
cloud-fan commented on code in PR #40456: URL: https://github.com/apache/spark/pull/40456#discussion_r1139667580 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ExtractDistributedSequenceID.scala: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40361: [SPARK_42742]access apiserver by pod env

2023-03-16 Thread via GitHub
dongjoon-hyun commented on code in PR #40361: URL: https://github.com/apache/spark/pull/40361#discussion_r1139667140 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala: ## @@ -55,6 +55,14 @@ private[spark] object Config extends Logging

[GitHub] [spark] dongjoon-hyun commented on pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-16 Thread via GitHub
dongjoon-hyun commented on PR #40421: URL: https://github.com/apache/spark/pull/40421#issuecomment-1473078509 +1, LGTM for Apache Spark 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun closed pull request #40417: [SPARK-42778][SQL][3.4] QueryStageExec should respect supportsRowBased

2023-03-16 Thread via GitHub
dongjoon-hyun closed pull request #40417: [SPARK-42778][SQL][3.4] QueryStageExec should respect supportsRowBased URL: https://github.com/apache/spark/pull/40417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #40456: [SPARK-42720][PS][SQL] Uses expression for distributed-sequence default index instead of plan

2023-03-16 Thread via GitHub
dongjoon-hyun commented on PR #40456: URL: https://github.com/apache/spark/pull/40456#issuecomment-1473076150 BTW, according to the JIRA, this is only for Apache Spark 3.5? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40440: [SPARK-42808][CORE] Avoid getting availableProcessors every time in `MapOutputTrackerMaster#getStatistics`

2023-03-16 Thread via GitHub
dongjoon-hyun commented on code in PR #40440: URL: https://github.com/apache/spark/pull/40440#discussion_r1139661089 ## core/src/main/scala/org/apache/spark/MapOutputTracker.scala: ## @@ -697,6 +697,8 @@ private[spark] class MapOutputTrackerMaster( pool } + private

[GitHub] [spark] dongjoon-hyun commented on pull request #40457: [SPARK-42823][SQL] `spark-sql` shell supports multipart namespaces for initialization

2023-03-16 Thread via GitHub
dongjoon-hyun commented on PR #40457: URL: https://github.com/apache/spark/pull/40457#issuecomment-1473073721 You're welcome! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #40445: [SPARK-42814][BUILD] Upgrade maven plugins to latest versions

2023-03-16 Thread via GitHub
dongjoon-hyun closed pull request #40445: [SPARK-42814][BUILD] Upgrade maven plugins to latest versions URL: https://github.com/apache/spark/pull/40445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] yaooqinn commented on pull request #40457: [SPARK-42823][SQL] `spark-sql` shell supports multipart namespaces for initialization

2023-03-16 Thread via GitHub
yaooqinn commented on PR #40457: URL: https://github.com/apache/spark/pull/40457#issuecomment-1473073457 thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-16 Thread via GitHub
srowen commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1473071461 If it's faster and gives the right answers, sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun closed pull request #40457: [SPARK-42823][SQL] `spark-sql` shell supports multipart namespaces for initialization

2023-03-16 Thread via GitHub
dongjoon-hyun closed pull request #40457: [SPARK-42823][SQL] `spark-sql` shell supports multipart namespaces for initialization URL: https://github.com/apache/spark/pull/40457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on pull request #40457: [SPARK-42823][SQL] spark-sql shell supports multipart namespaces for initialization

2023-03-16 Thread via GitHub
dongjoon-hyun commented on PR #40457: URL: https://github.com/apache/spark/pull/40457#issuecomment-1473069824 Thank you for pinging me, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] yaooqinn commented on pull request #40457: [SPARK-42823][SQL] spark-sql shell supports multipart namespaces for initialization

2023-03-16 Thread via GitHub
yaooqinn commented on PR #40457: URL: https://github.com/apache/spark/pull/40457#issuecomment-1473067918 cc @HyukjinKwon @cloud-fan @dongjoon-hyun, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-16 Thread via GitHub
zhengruifeng commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1473064231 @srowen if the latest performance test seems fine, then I'd ask the SQL guys whether we can have a subquery method in DataFrame APIs. -- This is an automated message from the

[GitHub] [spark] wangyum opened a new pull request, #40462: [SPARK-42832][SQL] Remove repartition if it is the child of LocalLimit

2023-03-16 Thread via GitHub
wangyum opened a new pull request, #40462: URL: https://github.com/apache/spark/pull/40462 ### What changes were proposed in this pull request? This PR enhances `CollapseRepartition` to remove repartition if it is the child of `LocalLimit`. For example: ```sql SELECT /*+

[GitHub] [spark] panbingkun closed pull request #40454: [SPARK-42821][SQL] Remove unused parameters in splitFiles methods

2023-03-16 Thread via GitHub
panbingkun closed pull request #40454: [SPARK-42821][SQL] Remove unused parameters in splitFiles methods URL: https://github.com/apache/spark/pull/40454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] panbingkun commented on pull request #40454: [SPARK-42821][SQL] Remove unused parameters in splitFiles methods

2023-03-16 Thread via GitHub
panbingkun commented on PR #40454: URL: https://github.com/apache/spark/pull/40454#issuecomment-1473058514 > Not a public API but probably not worth 'fixing' before spark 4 indeed OK, Let's fix it after spark 4 release. I will close it. -- This is an automated message from the

[GitHub] [spark] wankunde opened a new pull request, #40461: [SPARK-42831][SQL] Show result expressions in AggregateExec

2023-03-16 Thread via GitHub
wankunde opened a new pull request, #40461: URL: https://github.com/apache/spark/pull/40461 ### What changes were proposed in this pull request? If the result expressions in AggregateExec are not empty, we should display them. Or we will get confused because some important

[GitHub] [spark] srowen commented on pull request #40454: [SPARK-42821][SQL] Remove unused parameters in splitFiles methods

2023-03-16 Thread via GitHub
srowen commented on PR #40454: URL: https://github.com/apache/spark/pull/40454#issuecomment-1473042762 Not a public API but probably not worth 'fixing' before spark 4 indeed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] wangyum commented on pull request #40312: [SPARK-42695][SQL] Skew join handling in stream side of broadcast hash join

2023-03-16 Thread via GitHub
wangyum commented on PR #40312: URL: https://github.com/apache/spark/pull/40312#issuecomment-1473032777 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-16 Thread via GitHub
aokolnychyi commented on code in PR #40421: URL: https://github.com/apache/spark/pull/40421#discussion_r1139599841 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala: ## @@ -36,6 +36,7 @@ object

[GitHub] [spark] LuciferYang commented on pull request #40454: [SPARK-42821][SQL] Remove unused parameters in splitFiles methods

2023-03-16 Thread via GitHub
LuciferYang commented on PR #40454: URL: https://github.com/apache/spark/pull/40454#issuecomment-1473029179 Personally, I think it's definitely possible to change it in Spark 4.0, but I wasn't sure before. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] beliefer commented on pull request #40418: [SPARK-42790][SQL] Abstract the excluded method for better test for JDBC docker tests.

2023-03-16 Thread via GitHub
beliefer commented on PR #40418: URL: https://github.com/apache/spark/pull/40418#issuecomment-1473029216 ping @huaxingao cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly d

2023-03-16 Thread via GitHub
LuciferYang commented on PR #40389: URL: https://github.com/apache/spark/pull/40389#issuecomment-1473017525 > @LuciferYang Thank you. Should we update https://github.com/apache/spark/blob/master/docs/building-spark.md with this info? If needed, it may be more appropriate to update

[GitHub] [spark] HyukjinKwon closed pull request #40458: [SPARK-42824][CONNECT][PYTHON] Provide a clear error message for unsupported JVM attributes

2023-03-16 Thread via GitHub
HyukjinKwon closed pull request #40458: [SPARK-42824][CONNECT][PYTHON] Provide a clear error message for unsupported JVM attributes URL: https://github.com/apache/spark/pull/40458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #40458: [SPARK-42824][CONNECT][PYTHON] Provide a clear error message for unsupported JVM attributes

2023-03-16 Thread via GitHub
HyukjinKwon commented on PR #40458: URL: https://github.com/apache/spark/pull/40458#issuecomment-1473012048 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-16 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1139568392 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala: ## @@ -134,46 +192,46 @@ object

[GitHub] [spark] gatorsmile commented on a diff in pull request #40458: [SPARK-42824][CONNECT][PYTHON] Provide a clear error message for unsupported JVM attributes

2023-03-16 Thread via GitHub
gatorsmile commented on code in PR #40458: URL: https://github.com/apache/spark/pull/40458#discussion_r1139537952 ## python/pyspark/errors/error_classes.py: ## @@ -39,6 +39,11 @@ "Function `` should return Column, got ." ] }, + "JVM_ATTRIBUTE_NOT_SUPPORTED" : {

[GitHub] [spark] panbingkun commented on pull request #40454: [SPARK-42821][SQL] Remove unused parameters in splitFiles methods

2023-03-16 Thread via GitHub
panbingkun commented on PR #40454: URL: https://github.com/apache/spark/pull/40454#issuecomment-1472972319 > This function a bit special, It was discussed a long time ago ... [#37862 (comment)](https://github.com/apache/spark/pull/37862#issuecomment-1249419779) If it's a public API,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40458: [SPARK-42824][CONNECT][PYTHON] Provide a clear error message for unsupported JVM attributes.

2023-03-16 Thread via GitHub
HyukjinKwon commented on code in PR #40458: URL: https://github.com/apache/spark/pull/40458#discussion_r1139522726 ## python/pyspark/errors/error_classes.py: ## @@ -39,6 +39,11 @@ "Function `` should return Column, got ." ] }, + "JVM_ATTRIBUTE_NOT_SUPPORTED" : {

[GitHub] [spark] HyukjinKwon closed pull request #40459: [SPARK-42826][PS][DOCS] Add migration notes for update to supported pandas version.

2023-03-16 Thread via GitHub
HyukjinKwon closed pull request #40459: [SPARK-42826][PS][DOCS] Add migration notes for update to supported pandas version. URL: https://github.com/apache/spark/pull/40459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #40459: [SPARK-42826][PS][DOCS] Add migration notes for update to supported pandas version.

2023-03-16 Thread via GitHub
HyukjinKwon commented on PR #40459: URL: https://github.com/apache/spark/pull/40459#issuecomment-1472945603 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] github-actions[bot] closed pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2023-03-16 Thread via GitHub
github-actions[bot] closed pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files URL: https://github.com/apache/spark/pull/37725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] huaxingao commented on pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-16 Thread via GitHub
huaxingao commented on PR #40421: URL: https://github.com/apache/spark/pull/40421#issuecomment-1472903537 +1, LGTM Thanks for the PR @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] huaxingao commented on a diff in pull request #40421: [SPARK-42779][SQL] Allow V2 writes to indicate advisory shuffle partition size

2023-03-16 Thread via GitHub
huaxingao commented on code in PR #40421: URL: https://github.com/apache/spark/pull/40421#discussion_r1139458748 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala: ## @@ -36,6 +36,7 @@ object DistributionAndOrderingUtils

[GitHub] [spark] gengliangwang commented on pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-16 Thread via GitHub
gengliangwang commented on PR #40449: URL: https://github.com/apache/spark/pull/40449#issuecomment-1472785577 LGTM except for minor comments. Thanks for the work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gengliangwang commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-16 Thread via GitHub
gengliangwang commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139386978 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala: ## @@ -108,4 +128,132 @@ trait SQLQueryTestHelper { (emptySchema,

[GitHub] [spark] dtenedor commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-16 Thread via GitHub
dtenedor commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139386812 ## sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala: ## @@ -228,6 +228,7 @@ abstract class QueryTest extends PlanTest {

[GitHub] [spark] gengliangwang commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-16 Thread via GitHub
gengliangwang commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139382943 ## sql/core/src/test/resources/sql-tests/analyzer-results/array.sql.out: ## Review Comment: I expect there is `ansi/array.sql.out` as well -- This is an

[GitHub] [spark] gengliangwang commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-16 Thread via GitHub
gengliangwang commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139382539 ## sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala: ## @@ -228,6 +228,7 @@ abstract class QueryTest extends PlanTest {

[GitHub] [spark] otterc commented on pull request #40448: [SPARK-42817][CORE] Logging the shuffle service name once in ApplicationMaster

2023-03-16 Thread via GitHub
otterc commented on PR #40448: URL: https://github.com/apache/spark/pull/40448#issuecomment-1472770014 Thank you @dongjoon-hyun @HyukjinKwon @shuwang21 @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun commented on pull request #40448: [SPARK-42817][CORE] Logging the shuffle service name once in ApplicationMaster

2023-03-16 Thread via GitHub
dongjoon-hyun commented on PR #40448: URL: https://github.com/apache/spark/pull/40448#issuecomment-1472768397 Thank you for update. Merged to master/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun closed pull request #40448: [SPARK-42817][CORE] Logging the shuffle service name once in ApplicationMaster

2023-03-16 Thread via GitHub
dongjoon-hyun closed pull request #40448: [SPARK-42817][CORE] Logging the shuffle service name once in ApplicationMaster URL: https://github.com/apache/spark/pull/40448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dtenedor commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-16 Thread via GitHub
dtenedor commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139370077 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala: ## @@ -205,6 +194,14 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession

[GitHub] [spark] gengliangwang commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-16 Thread via GitHub
gengliangwang commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139327888 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala: ## @@ -525,23 +403,18 @@ class SQLQueryTestSuite extends QueryTest with

[GitHub] [spark] gengliangwang commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-16 Thread via GitHub
gengliangwang commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139325632 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala: ## @@ -205,6 +194,14 @@ class SQLQueryTestSuite extends QueryTest with

[GitHub] [spark] bjornjorgensen commented on pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongl

2023-03-16 Thread via GitHub
bjornjorgensen commented on PR #40389: URL: https://github.com/apache/spark/pull/40389#issuecomment-1472688553 @LuciferYang Thank you. Should we update https://github.com/apache/spark/blob/master/docs/building-spark.md with this info? -- This is an automated message from the Apache Git

[GitHub] [spark] gengliangwang commented on a diff in pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-16 Thread via GitHub
gengliangwang commented on code in PR #40449: URL: https://github.com/apache/spark/pull/40449#discussion_r1139311712 ## sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala: ## @@ -228,6 +229,101 @@ abstract class QueryTest extends PlanTest {

[GitHub] [spark] ritikam2 commented on a diff in pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-16 Thread via GitHub
ritikam2 commented on code in PR #40116: URL: https://github.com/apache/spark/pull/40116#discussion_r1139265664 ## sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala: ## @@ -40,12 +40,15 @@ abstract class SQLImplicits extends LowPrioritySQLImplicits { */

[GitHub] [spark] j03wang commented on pull request #40460: [SPARK-42828] More explicit Python type annotations for GroupedData

2023-03-16 Thread via GitHub
j03wang commented on PR #40460: URL: https://github.com/apache/spark/pull/40460#issuecomment-1472616495 @ueshin who last touched the type hints -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40459: [SPARK-42826][PS][DOCS] Add migration notes for update to supported pandas version.

2023-03-16 Thread via GitHub
HyukjinKwon commented on code in PR #40459: URL: https://github.com/apache/spark/pull/40459#discussion_r1139257625 ## python/docs/source/migration_guide/pyspark_upgrade.rst: ## @@ -33,6 +33,7 @@ Upgrading from PySpark 3.3 to 3.4 * In Spark 3.4, the ``Series.concat`` sort

[GitHub] [spark] j03wang opened a new pull request, #40460: [SPARK-42828] More explicit Python type annotations for GroupedData

2023-03-16 Thread via GitHub
j03wang opened a new pull request, #40460: URL: https://github.com/apache/spark/pull/40460 ### What changes were proposed in this pull request? Be more explicit in the `Callable` type annotation for `dfapi` and `df_varargs_api` to explicitly return a `DataFrame`. ###

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-16 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1139170837 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala: ## @@ -134,46 +192,46 @@ object

[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-16 Thread via GitHub
shrprasa commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1472394127 Gentle ping @holdenk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic opened a new pull request, #40459: [SPARK-42826][PS][DOCS] Add migration note for API changes

2023-03-16 Thread via GitHub
itholic opened a new pull request, #40459: URL: https://github.com/apache/spark/pull/40459 ### What changes were proposed in this pull request? This PR proposes to add a migration note for API changes for pandas API on Spark. ### Why are the changes needed? Some

[GitHub] [spark] LuciferYang commented on pull request #40454: [SPARK-42821][SQL] Remove unused parameters in splitFiles methods

2023-03-16 Thread via GitHub
LuciferYang commented on PR #40454: URL: https://github.com/apache/spark/pull/40454#issuecomment-1472312946 This function a bit special, It was discussed a long time ago ... https://github.com/apache/spark/pull/37862#issuecomment-1249419779 -- This is an automated message from the

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-16 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1138926388 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -0,0 +1,679 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] itholic commented on pull request #40458: [SPARK-42824][CONNECT][PYTHON] Provide a clear error message for unsupported JVM attributes.

2023-03-16 Thread via GitHub
itholic commented on PR #40458: URL: https://github.com/apache/spark/pull/40458#issuecomment-1472234887 > [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute _jsc is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your

[GitHub] [spark] hvanhovell commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-16 Thread via GitHub
hvanhovell commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1138916077 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -0,0 +1,679 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-16 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1138911430 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-16 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1138912177 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] hvanhovell commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-16 Thread via GitHub
hvanhovell commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1138910434 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] hvanhovell commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-16 Thread via GitHub
hvanhovell commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1138909855 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] allanf-db commented on pull request #40458: [SPARK-42824][CONNECT][PYTHON] Provide a clear error message for unsupported JVM attributes.

2023-03-16 Thread via GitHub
allanf-db commented on PR #40458: URL: https://github.com/apache/spark/pull/40458#issuecomment-1472225271 Instead of proposing that the user uses another PySpark version, I think it's better to suggest that the user creates a Spark Driver session instead of a Spark Connect session.

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-16 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1138885942 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -0,0 +1,679 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] itholic commented on a diff in pull request #40458: [SPARK-42824][CONNECT][PYTHON] Provide a clear error message for unsupported JVM attributes.

2023-03-16 Thread via GitHub
itholic commented on code in PR #40458: URL: https://github.com/apache/spark/pull/40458#discussion_r1138872667 ## python/pyspark/errors/error_classes.py: ## @@ -39,6 +39,11 @@ "Function `` should return Column, got ." ] }, + "JVM_ATTRIBUTE_NOT_SUPPORTED" : { +

[GitHub] [spark] itholic opened a new pull request, #40458: [SPARK-42824][CONNECT][PYTHON] Provide a clear error message for unsupported JVM attributes.

2023-03-16 Thread via GitHub
itholic opened a new pull request, #40458: URL: https://github.com/apache/spark/pull/40458 ### What changes were proposed in this pull request? This pull request proposes an improvement to the error message when trying to access a JVM attribute that is not supported in Spark Connect.

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-16 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1138619277 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/AlgorithmRegisty.scala: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] MaxGekk commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-16 Thread via GitHub
MaxGekk commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1138560168 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -47,6 +47,16 @@ object Connect { .bytesConf(ByteUnit.MiB)

[GitHub] [spark] cloud-fan commented on a diff in pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-16 Thread via GitHub
cloud-fan commented on code in PR #39239: URL: https://github.com/apache/spark/pull/39239#discussion_r1138553272 ## python/pyspark/sql/types.py: ## @@ -276,7 +276,18 @@ def toInternal(self, dt: datetime.datetime) -> int: def fromInternal(self, ts: int) ->

[GitHub] [spark] LuciferYang commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-16 Thread via GitHub
LuciferYang commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1471794732 @navinvishy like `ProblemFilters.exclude[Problem]("org.apache.spark.sql.functions.array_prepend"),` and you can run ` dev/connect-jvm-client-mima-check ` to check the result --

[GitHub] [spark] LuciferYang commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-16 Thread via GitHub
LuciferYang commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1471792565 we need add a new rule to `CheckConnectJvmClientCompatibility` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] navinvishy opened a new pull request, #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-16 Thread via GitHub
navinvishy opened a new pull request, #38947: URL: https://github.com/apache/spark/pull/38947 ### What changes were proposed in this pull request? Adds a new array function array_prepend to catalyst. ### Why are the changes needed? This adds a function that

[GitHub] [spark] HyukjinKwon commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-16 Thread via GitHub
HyukjinKwon commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1471792165 I reopened the PR. @navinvishy would you mind rebasing and syncing to the latest master branch? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-16 Thread via GitHub
HyukjinKwon commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1471790173 Seems like this broke the mima check for Spark connect. I am reverting this for now. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-16 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1138348490 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-16 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1138499467 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -272,6 +274,14 @@ class SparkSession private[sql] ( */ def

[GitHub] [spark] yabola commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-16 Thread via GitHub
yabola commented on PR #39950: URL: https://github.com/apache/spark/pull/39950#issuecomment-1471754271 @sunchao please take a look, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-16 Thread via GitHub
yabola commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1138487479 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala: ## @@ -182,6 +186,9 @@ class ParquetFileFormat val

[GitHub] [spark] hvanhovell commented on a diff in pull request #40438: [SPARK-42806][CONNECT] Add `Catalog` support

2023-03-16 Thread via GitHub
hvanhovell commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1138481493 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -272,6 +274,14 @@ class SparkSession private[sql] ( */ def

[GitHub] [spark] HeartSaVioR closed pull request #40455: [SPARK-42819][SS] Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming

2023-03-16 Thread via GitHub
HeartSaVioR closed pull request #40455: [SPARK-42819][SS] Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming URL: https://github.com/apache/spark/pull/40455 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HeartSaVioR commented on pull request #40455: [SPARK-42819][SS] Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming

2023-03-16 Thread via GitHub
HeartSaVioR commented on PR #40455: URL: https://github.com/apache/spark/pull/40455#issuecomment-1471722425 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #40455: [SPARK-42819][SS] Add support for setting max_write_buffer_number and write_buffer_size for RocksDB used in streaming

2023-03-16 Thread via GitHub
HeartSaVioR commented on PR #40455: URL: https://github.com/apache/spark/pull/40455#issuecomment-1471721625 https://github.com/anishshri-db/spark/actions/runs/4434196358/jobs/7780002037 Looks like PySpark build is stuck but this change does not involve impact on PySpark. -- This

[GitHub] [spark] yaooqinn opened a new pull request, #40457: [SPARK-42823][SQL] spark-sql shell supports multipart namespaces for initialization

2023-03-16 Thread via GitHub
yaooqinn opened a new pull request, #40457: URL: https://github.com/apache/spark/pull/40457 ### What changes were proposed in this pull request? Currently, we only support initializing spark-sql shell with a single-part schema, which also must be forced to the session

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-16 Thread via GitHub
HyukjinKwon commented on code in PR #39239: URL: https://github.com/apache/spark/pull/39239#discussion_r1138430073 ## python/pyspark/sql/types.py: ## @@ -276,7 +276,18 @@ def toInternal(self, dt: datetime.datetime) -> int: def fromInternal(self, ts: int) ->

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-16 Thread via GitHub
HyukjinKwon commented on code in PR #39239: URL: https://github.com/apache/spark/pull/39239#discussion_r1138428968 ## python/pyspark/sql/types.py: ## @@ -276,7 +276,18 @@ def toInternal(self, dt: datetime.datetime) -> int: def fromInternal(self, ts: int) ->

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40456: [SPARK-42720][PS][SQL] Uses expression for distributed-sequence default index instead of plan

2023-03-16 Thread via GitHub
zhengruifeng commented on code in PR #40456: URL: https://github.com/apache/spark/pull/40456#discussion_r1138370203 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/DistributedSequenceID.scala: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-16 Thread via GitHub
zhengruifeng commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1471632549 merged into master, thank you @navinvishy for working on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng closed pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-16 Thread via GitHub
zhengruifeng closed pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function URL: https://github.com/apache/spark/pull/38947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-16 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1138372032 ## mllib/common/src/main/scala/org/apache/spark/ml/param/params.scala: ## @@ -793,6 +800,10 @@ trait Params extends Identifiable with Serializable { this

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-16 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1138369409 ## python/pyspark/sql/connect/session.py: ## @@ -463,7 +463,7 @@ def stop(self) -> None: @classmethod def getActiveSession(cls) -> Any: -raise

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40456: [SPARK-42720][PS][SQL] Uses expression for distributed-sequence default index instead of plan

2023-03-16 Thread via GitHub
zhengruifeng commented on code in PR #40456: URL: https://github.com/apache/spark/pull/40456#discussion_r1138370203 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/DistributedSequenceID.scala: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

  1   2   >