[GitHub] [spark] LuciferYang commented on a diff in pull request #40510: [SPARK-42889][CONNECT][PYTHON] Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread via GitHub
LuciferYang commented on code in PR #40510: URL: https://github.com/apache/spark/pull/40510#discussion_r1144269110 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -54,6 +54,20 @@ message UserContext { repeated google.protobuf.Any extensions =

[GitHub] [spark] LuciferYang commented on a diff in pull request #40516: [SPARK-42894][CONNECT] Support `cache`/`persist`/`unpersist`/`storageLevel` for Spark connect jvm client

2023-03-21 Thread via GitHub
LuciferYang commented on code in PR #40516: URL: https://github.com/apache/spark/pull/40516#discussion_r1144250578 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2771,22 +2771,86 @@ class Dataset[T] private[sql] ( new

[GitHub] [spark] LuciferYang commented on pull request #40516: [SPARK-42894][CONNECT] Support `cache`/`persist`/`unpersist`/`storageLevel` for Spark connect jvm client

2023-03-21 Thread via GitHub
LuciferYang commented on PR #40516: URL: https://github.com/apache/spark/pull/40516#issuecomment-1478944069 cc @HyukjinKwon @hvanhovell FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang opened a new pull request, #40516: [SPARK-42894][CONNECT] Support `cache`/`persist`/`unpersist`/`storageLevel` for Scala connect client

2023-03-21 Thread via GitHub
LuciferYang opened a new pull request, #40516: URL: https://github.com/apache/spark/pull/40516 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-21 Thread via GitHub
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1478932824 Perhaps the following would be better solution. Instead of looking for star any UnresolvedFunction should have UnresolvedAlias. Any comments? `private[this] def alias(expr:

[GitHub] [spark] Stove-hust commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-21 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1478924572 @mridulm yep,it`s me Username: StoveM Full name: Fencheng Mei -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] Stove-hust commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-21 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1478924279 > yep,it`s me Username:StoveM Full name: Fencheng Mei -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] srowen commented on pull request #40504: [SPARK-42880][DOCS] Update running-on-yarn.md to log4j2 syntax

2023-03-21 Thread via GitHub
srowen commented on PR #40504: URL: https://github.com/apache/spark/pull/40504#issuecomment-1478906144 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #40504: [SPARK-42880][DOCS] Update running-on-yarn.md to log4j2 syntax

2023-03-21 Thread via GitHub
srowen closed pull request #40504: [SPARK-42880][DOCS] Update running-on-yarn.md to log4j2 syntax URL: https://github.com/apache/spark/pull/40504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] mridulm commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-21 Thread via GitHub
mridulm commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1478905577 Is your jira id `StoveM` @Stove-hust ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on a diff in pull request #40512: [SPARK-42892][SQL] Move sameType and relevant methods out of DataType

2023-03-21 Thread via GitHub
cloud-fan commented on code in PR #40512: URL: https://github.com/apache/spark/pull/40512#discussion_r1144213535 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] cloud-fan commented on a diff in pull request #40512: [SPARK-42892][SQL] Move sameType and relevant methods out of DataType

2023-03-21 Thread via GitHub
cloud-fan commented on code in PR #40512: URL: https://github.com/apache/spark/pull/40512#discussion_r1144213026 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] cloud-fan closed pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-21 Thread via GitHub
cloud-fan closed pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF URL: https://github.com/apache/spark/pull/40397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-21 Thread via GitHub
cloud-fan commented on PR #40397: URL: https://github.com/apache/spark/pull/40397#issuecomment-1478896479 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-21 Thread via GitHub
beliefer commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1144208941 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -1211,13 +1211,11 @@ class Column private[sql] (private[sql] val expr:

[GitHub] [spark] zhengruifeng commented on pull request #40514: [SPARK-41233][CONNECT][PYTHON] Add array_prepend to Spark Connect Python client

2023-03-21 Thread via GitHub
zhengruifeng commented on PR #40514: URL: https://github.com/apache/spark/pull/40514#issuecomment-1478884594 would you mind adding a simple test here?

[GitHub] [spark] hvanhovell commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-21 Thread via GitHub
hvanhovell commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1144196759 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -1211,13 +1211,11 @@ class Column private[sql] (private[sql] val expr:

[GitHub] [spark] hvanhovell closed pull request #40413: [SPARK-42786][Connect] Typed Select

2023-03-21 Thread via GitHub
hvanhovell closed pull request #40413: [SPARK-42786][Connect] Typed Select URL: https://github.com/apache/spark/pull/40413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] hvanhovell commented on pull request #40413: [SPARK-42786][Connect] Typed Select

2023-03-21 Thread via GitHub
hvanhovell commented on PR #40413: URL: https://github.com/apache/spark/pull/40413#issuecomment-1478874829 Merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on pull request #40368: [SPARK-42748][CONNECT] Server-side Artifact Management

2023-03-21 Thread via GitHub
hvanhovell commented on PR #40368: URL: https://github.com/apache/spark/pull/40368#issuecomment-1478874581 @vicennial can you update? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun commented on pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-21 Thread via GitHub
panbingkun commented on PR #40397: URL: https://github.com/apache/spark/pull/40397#issuecomment-1478870787 @cloud-fan Can we merge it to master? After it I will try to refactor HiveGenericUDTF & HiveUDAFFunction. Thanks! -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] hvanhovell opened a new pull request, #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

2023-03-21 Thread via GitHub
hvanhovell opened a new pull request, #40515: URL: https://github.com/apache/spark/pull/40515 ### What changes were proposed in this pull request? This PR adds Ammonite REPL integration for Spark Connect. This has a couple of benefits: - It makes it a lot less cumbersome for users to

[GitHub] [spark] LuciferYang commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub
LuciferYang commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-1478858139 > @dtenedor Wait a few minutes for me to check with Scala 2.13 manually done, should be ok ~ -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] beliefer commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-21 Thread via GitHub
beliefer commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1144182236 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -1211,13 +1211,11 @@ class Column private[sql] (private[sql] val expr:

[GitHub] [spark] Stove-hust commented on pull request #40412: [SPARK-42784] should still create subDir when the number of subDir in merge dir is less than conf

2023-03-21 Thread via GitHub
Stove-hust commented on PR #40412: URL: https://github.com/apache/spark/pull/40412#issuecomment-1478854808 @zhouyejoe I I kept the accident scene, should be able to help you。(In our clustered machine environment,11 HDD,creating 11 * 64 subdirectories would take longer to create)

[GitHub] [spark] Stove-hust commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-21 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1478849048 > I could not cherry pick this into 3.4 and 3.3 - we should fix for those branches as well IMO. Can you create a PR against those two branches as well @Stove-hust ? Thanks No

[GitHub] [spark] beliefer commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-21 Thread via GitHub
beliefer commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1144175741 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -1211,13 +1211,11 @@ class Column private[sql] (private[sql] val expr:

[GitHub] [spark] ueshin opened a new pull request, #40514: [SPARK-41233][CONNECT][PYTHON] Add array_prepend to Spark Connect Python client

2023-03-21 Thread via GitHub
ueshin opened a new pull request, #40514: URL: https://github.com/apache/spark/pull/40514 ### What changes were proposed in this pull request? This is a follow-up of #38947. Add `array_prepend` function to Spark Connect Python client. ### Why are the changes needed?

[GitHub] [spark] LuciferYang commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub
LuciferYang commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-147884 @dtenedor Wait a few minutes for me to check with Scala 2.13 manually -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub
dtenedor commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-1478835534 @HyukjinKwon the tests are passing now, this is ready to merge if you are ready :) -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] mridulm commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-21 Thread via GitHub
mridulm commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1478833979 I could not cherry pick this into 3.4 and 3.3 - we should fix for those branches as well IMO. Can you create a PR against those two branches as well @Stove-hust ? Thanks -- This is

[GitHub] [spark] mridulm commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-21 Thread via GitHub
mridulm commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1478833621 Merged to master. Thanks for working on this @Stove-hust ! Thanks for the review @otterc :-) -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] mridulm closed pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-21 Thread via GitHub
mridulm closed pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks URL: https://github.com/apache/spark/pull/40393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] xinrong-meng opened a new pull request, #40513: Block Arrow-optimized Python UDFs

2023-03-21 Thread via GitHub
xinrong-meng opened a new pull request, #40513: URL: https://github.com/apache/spark/pull/40513 ### What changes were proposed in this pull request? Block the usage of Arrow-optimized Python UDFs in Apache Spark 3.4.0. ### Why are the changes needed? Considering the upcoming

[GitHub] [spark] hvanhovell commented on a diff in pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-03-21 Thread via GitHub
hvanhovell commented on code in PR #40467: URL: https://github.com/apache/spark/pull/40467#discussion_r1144160497 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -1211,13 +1211,11 @@ class Column private[sql] (private[sql] val expr:

[GitHub] [spark] LuciferYang commented on pull request #40489: [SPARK-42871][BUILD] Upgrade slf4j to 2.0.7

2023-03-21 Thread via GitHub
LuciferYang commented on PR #40489: URL: https://github.com/apache/spark/pull/40489#issuecomment-1478824499 Thanks @srowen @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #40490: [SPARK-42536][BUILD] Upgrade log4j2 to 2.20.0

2023-03-21 Thread via GitHub
LuciferYang commented on PR #40490: URL: https://github.com/apache/spark/pull/40490#issuecomment-1478821629 Thanks @dongjoon-hyun @viirya ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] frankliee commented on a diff in pull request #40504: [SPARK-42880][DOCS] Update running-on-yarn.md to log4j2 syntax

2023-03-21 Thread via GitHub
frankliee commented on code in PR #40504: URL: https://github.com/apache/spark/pull/40504#discussion_r1144154895 ## docs/running-on-yarn.md: ## @@ -137,7 +137,7 @@ Note that for the first option, both executors and the application master will s log4j configuration, which may

[GitHub] [spark] frankliee commented on a diff in pull request #40504: [SPARK-42880][DOCS] Update running-on-yarn.md to log4j2 syntax

2023-03-21 Thread via GitHub
frankliee commented on code in PR #40504: URL: https://github.com/apache/spark/pull/40504#discussion_r1144154348 ## docs/running-on-yarn.md: ## @@ -137,7 +137,7 @@ Note that for the first option, both executors and the application master will s log4j configuration, which may

[GitHub] [spark] amaliujia commented on pull request #40512: [SPARK-42892][SQL] Move sameType and relevant methods out of DataType

2023-03-21 Thread via GitHub
amaliujia commented on PR #40512: URL: https://github.com/apache/spark/pull/40512#issuecomment-1478801856 @hvanhovell @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia opened a new pull request, #40512: [SPARK-42892][SQL] Move sameType and relevant methods out of DataType

2023-03-21 Thread via GitHub
amaliujia opened a new pull request, #40512: URL: https://github.com/apache/spark/pull/40512 ### What changes were proposed in this pull request? This PR moves the following methods from `DataType`: 1. equalsIgnoreNullability 2. sameType 3.

[GitHub] [spark] cloud-fan closed pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-21 Thread via GitHub
cloud-fan closed pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes URL: https://github.com/apache/spark/pull/40385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-21 Thread via GitHub
cloud-fan commented on PR #40385: URL: https://github.com/apache/spark/pull/40385#issuecomment-1478798727 thanks, merging to master/3.4! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40034: [SPARK-42447][INFRA] Remove Hadoop 2 GitHub Action job

2023-03-21 Thread via GitHub
HyukjinKwon commented on PR #40034: URL: https://github.com/apache/spark/pull/40034#issuecomment-1478796828 I remember Guava upgrade is also blocked by Hive .. IIRC .. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon closed pull request #40494: [MINOR][DOCS] Fix typos

2023-03-21 Thread via GitHub
HyukjinKwon closed pull request #40494: [MINOR][DOCS] Fix typos URL: https://github.com/apache/spark/pull/40494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on pull request #40494: [MINOR][DOCS] Fix typos

2023-03-21 Thread via GitHub
HyukjinKwon commented on PR #40494: URL: https://github.com/apache/spark/pull/40494#issuecomment-1478792526 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #40510: [SPARK-42889][CONNECT][PYTHON] Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread via GitHub
HyukjinKwon closed pull request #40510: [SPARK-42889][CONNECT][PYTHON] Implement cache, persist, unpersist, and storageLevel URL: https://github.com/apache/spark/pull/40510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #40510: [SPARK-42889][CONNECT][PYTHON] Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread via GitHub
HyukjinKwon commented on PR #40510: URL: https://github.com/apache/spark/pull/40510#issuecomment-1478790850 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-21 Thread via GitHub
dongjoon-hyun commented on PR #40447: URL: https://github.com/apache/spark/pull/40447#issuecomment-1478787974 Merged to master/branch-3.4. Thank you, @grundprinzip and all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun closed pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-21 Thread via GitHub
dongjoon-hyun closed pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB URL: https://github.com/apache/spark/pull/40447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] sunchao commented on pull request #40511: [SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11

2023-03-21 Thread via GitHub
sunchao commented on PR #40511: URL: https://github.com/apache/spark/pull/40511#issuecomment-1478787651 LGTM too, thanks @cnauroth @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] sudoliyang commented on pull request #40494: [MINOR][DOCS] Fix typos

2023-03-21 Thread via GitHub
sudoliyang commented on PR #40494: URL: https://github.com/apache/spark/pull/40494#issuecomment-1478786416 @HyukjinKwon Thanks. I ran the tests at https://github.com/sudoliyang/spark/pull/2. I would like to rebase and force push again to run all tests here. -- This is an automated

[GitHub] [spark] dongjoon-hyun commented on pull request #40511: [SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11

2023-03-21 Thread via GitHub
dongjoon-hyun commented on PR #40511: URL: https://github.com/apache/spark/pull/40511#issuecomment-1478780794 Merged to master/3.4 for Apache Spark 3.4.0. This will be a part of next Apache Spark 3.4.0 RC. I added you to the Apache Spark contributor group and assigned SPARK-42888 to

[GitHub] [spark] dongjoon-hyun closed pull request #40511: [SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11

2023-03-21 Thread via GitHub
dongjoon-hyun closed pull request #40511: [SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11 URL: https://github.com/apache/spark/pull/40511 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-21 Thread via GitHub
zhengruifeng commented on PR #40263: URL: https://github.com/apache/spark/pull/40263#issuecomment-1478764552 @srowen sounds reasonable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #40489: [SPARK-42871][BUILD] Upgrade slf4j to 2.0.7

2023-03-21 Thread via GitHub
HyukjinKwon closed pull request #40489: [SPARK-42871][BUILD] Upgrade slf4j to 2.0.7 URL: https://github.com/apache/spark/pull/40489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #40489: [SPARK-42871][BUILD] Upgrade slf4j to 2.0.7

2023-03-21 Thread via GitHub
HyukjinKwon commented on PR #40489: URL: https://github.com/apache/spark/pull/40489#issuecomment-1478757952 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #40468: [SPARK-42838][SQL] changed error class name _LEGACY_ERROR_TEMP_2000

2023-03-21 Thread via GitHub
HyukjinKwon commented on PR #40468: URL: https://github.com/apache/spark/pull/40468#issuecomment-1478757584 Mind filling the PR description as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon closed pull request #40507: [SPARK-42662][CONNECT][PS] Add proto message for pandas API on Spark default index

2023-03-21 Thread via GitHub
HyukjinKwon closed pull request #40507: [SPARK-42662][CONNECT][PS] Add proto message for pandas API on Spark default index URL: https://github.com/apache/spark/pull/40507 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon closed pull request #40505: [MINOR][DOCS] Remove SparkSession constructor invocation in the example

2023-03-21 Thread via GitHub
HyukjinKwon closed pull request #40505: [MINOR][DOCS] Remove SparkSession constructor invocation in the example URL: https://github.com/apache/spark/pull/40505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #40507: [SPARK-42662][CONNECT][PS] Add proto message for pandas API on Spark default index

2023-03-21 Thread via GitHub
HyukjinKwon commented on PR #40507: URL: https://github.com/apache/spark/pull/40507#issuecomment-1478755845 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40505: [MINOR][DOCS] Remove SparkSession constructor invocation in the example

2023-03-21 Thread via GitHub
HyukjinKwon commented on code in PR #40505: URL: https://github.com/apache/spark/pull/40505#discussion_r1144108734 ## python/pyspark/sql/session.py: ## @@ -179,10 +179,15 @@ class SparkSession(SparkConversionMixin): ... .getOrCreate() ... ) -Create a

[GitHub] [spark] HyukjinKwon commented on pull request #40505: [MINOR][DOCS] Remove SparkSession constructor invocation in the example

2023-03-21 Thread via GitHub
HyukjinKwon commented on PR #40505: URL: https://github.com/apache/spark/pull/40505#issuecomment-1478755239 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] github-actions[bot] closed pull request #38608: [SPARK-41080][SQL] Support Bit manipulation function SETBIT

2023-03-21 Thread via GitHub
github-actions[bot] closed pull request #38608: [SPARK-41080][SQL] Support Bit manipulation function SETBIT URL: https://github.com/apache/spark/pull/38608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] github-actions[bot] closed pull request #38534: [SPARK-38505][SQL] Make partial aggregation adaptive

2023-03-21 Thread via GitHub
github-actions[bot] closed pull request #38534: [SPARK-38505][SQL] Make partial aggregation adaptive URL: https://github.com/apache/spark/pull/38534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhouyejoe commented on pull request #40412: [SPARK-42784] should still create subDir when the number of subDir in merge dir is less than conf

2023-03-21 Thread via GitHub
zhouyejoe commented on PR #40412: URL: https://github.com/apache/spark/pull/40412#issuecomment-1478751685 @Stove-hust I think the changes will help resolve the issue described in the ticket. I am checking more about what could be causing to the race conditions where there are two Executor

[GitHub] [spark] cnauroth commented on pull request #40511: [SPARK-42888][BUILD] Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread via GitHub
cnauroth commented on PR #40511: URL: https://github.com/apache/spark/pull/40511#issuecomment-1478711103 @dongjoon-hyun , may I ask for your review, since you did the original import of the GCS connector in [SPARK-33605](https://issues.apache.org/jira/browse/SPARK-33605)/#37745? Thank

[GitHub] [spark] cnauroth opened a new pull request, #40511: [SPARK-42888][BUILD] Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread via GitHub
cnauroth opened a new pull request, #40511: URL: https://github.com/apache/spark/pull/40511 ### What changes were proposed in this pull request? Upgrade the [GCS Connector](https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs) bundled in the Spark distro from

[GitHub] [spark] ueshin commented on a diff in pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options

2023-03-21 Thread via GitHub
ueshin commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1144066905 ## python/pyspark/sql/connect/plan.py: ## @@ -302,13 +302,16 @@ def plan(self, session: "SparkConnectClient") -> proto.Relation: class Read(LogicalPlan): -def

[GitHub] [spark] ueshin commented on a diff in pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options

2023-03-21 Thread via GitHub
ueshin commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1144066905 ## python/pyspark/sql/connect/plan.py: ## @@ -302,13 +302,16 @@ def plan(self, session: "SparkConnectClient") -> proto.Relation: class Read(LogicalPlan): -def

[GitHub] [spark] ueshin commented on a diff in pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options

2023-03-21 Thread via GitHub
ueshin commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1144066373 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -122,6 +122,9 @@ message Read { message NamedTable { // (Required) Unparsed

[GitHub] [spark] ueshin opened a new pull request, #40510: [SPARK-42889][CONNECT][PYTHON] Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread via GitHub
ueshin opened a new pull request, #40510: URL: https://github.com/apache/spark/pull/40510 ### What changes were proposed in this pull request? Implements `DataFrame.cache`, `persist`, `unpersist`, and `storageLevel`. ### Why are the changes needed? Missing APIs.

[GitHub] [spark] amaliujia commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub
amaliujia commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-1478658342 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub
dtenedor commented on code in PR #40496: URL: https://github.com/apache/spark/pull/40496#discussion_r1144040124 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala: ## @@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging { */ protected def

[GitHub] [spark] amaliujia commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub
amaliujia commented on code in PR #40496: URL: https://github.com/apache/spark/pull/40496#discussion_r1144038811 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala: ## @@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging { */ protected def

[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub
dtenedor commented on code in PR #40496: URL: https://github.com/apache/spark/pull/40496#discussion_r1144023834 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala: ## @@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging { */ protected def

[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub
dtenedor commented on code in PR #40496: URL: https://github.com/apache/spark/pull/40496#discussion_r1144021973 ## sql/core/src/test/resources/sql-tests/analyzer-results/ansi/cast.sql.out: ## @@ -0,0 +1,881 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +SELECT

[GitHub] [spark] amaliujia commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub
amaliujia commented on code in PR #40496: URL: https://github.com/apache/spark/pull/40496#discussion_r1144020709 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala: ## @@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging { */ protected def

[GitHub] [spark] grundprinzip commented on pull request #39294: [SPARK-41537][INFRA][TESTS] Github Workflow Check for Breaking Changes in Spark Connect Proto

2023-03-21 Thread via GitHub
grundprinzip commented on PR #39294: URL: https://github.com/apache/spark/pull/39294#issuecomment-1478626808 The current level of proto compat required is the most strict one which requires to be fully compatible, even with the naming of the fields. Strictly speaking, this is not required

[GitHub] [spark] amaliujia commented on pull request #39294: [SPARK-41537][INFRA][TESTS] Github Workflow Check for Breaking Changes in Spark Connect Proto

2023-03-21 Thread via GitHub
amaliujia commented on PR #39294: URL: https://github.com/apache/spark/pull/39294#issuecomment-1478618853 > ![SCR-20230321-v7f](https://user-images.githubusercontent.com/3421/226745165-60b1611a-2ec5-4ddf-98c0-61c3a180ed9c.png) > > Verifying that the broken build is

[GitHub] [spark] grundprinzip commented on a diff in pull request #39294: [SPARK-41537][INFRA][TESTS] Github Workflow Check for Breaking Changes in Spark Connect Proto

2023-03-21 Thread via GitHub
grundprinzip commented on code in PR #39294: URL: https://github.com/apache/spark/pull/39294#discussion_r1144010201 ## .github/workflows/build_and_test.yml: ## @@ -493,6 +494,80 @@ jobs: name: test-results-sparkr--8-${{ inputs.hadoop }}-hive2.3 path:

[GitHub] [spark] grundprinzip commented on a diff in pull request #39294: [SPARK-41537][INFRA][TESTS] Github Workflow Check for Breaking Changes in Spark Connect Proto

2023-03-21 Thread via GitHub
grundprinzip commented on code in PR #39294: URL: https://github.com/apache/spark/pull/39294#discussion_r1144009897 ## .github/workflows/build_and_test.yml: ## @@ -493,6 +494,80 @@ jobs: name: test-results-sparkr--8-${{ inputs.hadoop }}-hive2.3 path:

[GitHub] [spark] grundprinzip commented on pull request #39294: [SPARK-41537][INFRA][TESTS] Github Workflow Check for Breaking Changes in Spark Connect Proto

2023-03-21 Thread via GitHub
grundprinzip commented on PR #39294: URL: https://github.com/apache/spark/pull/39294#issuecomment-1478606975 ![SCR-20230321-v7f](https://user-images.githubusercontent.com/3421/226745165-60b1611a-2ec5-4ddf-98c0-61c3a180ed9c.png) Verifying that the broken build is reported

[GitHub] [spark] StevenChenDatabricks commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-21 Thread via GitHub
StevenChenDatabricks commented on PR #40385: URL: https://github.com/apache/spark/pull/40385#issuecomment-1478591539 @cloud-fan I've addressed your comments. Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] viirya commented on pull request #40490: [SPARK-42536][BUILD] Upgrade log4j2 to 2.20.0

2023-03-21 Thread via GitHub
viirya commented on PR #40490: URL: https://github.com/apache/spark/pull/40490#issuecomment-1478569986 Looks good to me. +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun commented on pull request #40490: [SPARK-42536][BUILD] Upgrade log4j2 to 2.20.0

2023-03-21 Thread via GitHub
dongjoon-hyun commented on PR #40490: URL: https://github.com/apache/spark/pull/40490#issuecomment-1478567572 cc @viirya , too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #40509: [SPARK-42885][K8S][BUILD] Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread via GitHub
dongjoon-hyun closed pull request #40509: [SPARK-42885][K8S][BUILD] Upgrade `kubernetes-client` to 6.5.1 URL: https://github.com/apache/spark/pull/40509 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #40509: [SPARK-42885][K8S][BUILD] Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread via GitHub
dongjoon-hyun commented on PR #40509: URL: https://github.com/apache/spark/pull/40509#issuecomment-1478564576 Thank you, @viirya . Merged to master for Apache Spark 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #40440: [SPARK-42808][CORE] Avoid getting availableProcessors every time in `MapOutputTrackerMaster#getStatistics`

2023-03-21 Thread via GitHub
dongjoon-hyun commented on PR #40440: URL: https://github.com/apache/spark/pull/40440#issuecomment-1478522804 Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun commented on pull request #40509: [SPARK-42885][K8S][BUILD] Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread via GitHub
dongjoon-hyun commented on PR #40509: URL: https://github.com/apache/spark/pull/40509#issuecomment-1478454379 Thank you, @amaliujia . Also, could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] peter-toth commented on pull request #40268: [SPARK-42500][SQL] ConstantPropagation support more cases

2023-03-21 Thread via GitHub
peter-toth commented on PR #40268: URL: https://github.com/apache/spark/pull/40268#issuecomment-1478292434 @cloud-fan, @wangyum please let me know if this PR needs further improvements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] simonvanderveldt commented on pull request #38518: [SPARK-33349][K8S] Reset the executor pods watcher when we receive a version changed from k8s

2023-03-21 Thread via GitHub
simonvanderveldt commented on PR #38518: URL: https://github.com/apache/spark/pull/38518#issuecomment-1478291017 > To be safe, could you revisie this PR by adding a new internal configuration like KUBERNETES_EXECUTOR_ENABLE_API_WATCHER and use it with the following? Not sure this

[GitHub] [spark] amaliujia commented on pull request #40509: [SPARK-42885][K8S][BUILD] Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread via GitHub
amaliujia commented on PR #40509: URL: https://github.com/apache/spark/pull/40509#issuecomment-1478279790 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-21 Thread via GitHub
amaliujia commented on PR #40447: URL: https://github.com/apache/spark/pull/40447#issuecomment-1478275510 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options

2023-03-21 Thread via GitHub
amaliujia commented on PR #40498: URL: https://github.com/apache/spark/pull/40498#issuecomment-1478228229 @hvanhovell oh are you thinking about golden file based tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] amaliujia commented on pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options

2023-03-21 Thread via GitHub
amaliujia commented on PR #40498: URL: https://github.com/apache/spark/pull/40498#issuecomment-1478220179 @hvanhovell existing codebase uses this to verify the options: ``` test("SPARK-32844: DataFrameReader.table take the specified options for V1 relation") {

[GitHub] [spark] srielau commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
srielau commented on code in PR #40508: URL: https://github.com/apache/spark/pull/40508#discussion_r1143689845 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -213,7 +213,8 @@ class SparkSession private[sql] ( * @param sqlText

[GitHub] [spark] attilapiros commented on a diff in pull request #38518: [SPARK-33349][K8S] Reset the executor pods watcher when we receive a version changed from k8s

2023-03-21 Thread via GitHub
attilapiros commented on code in PR #38518: URL: https://github.com/apache/spark/pull/38518#discussion_r1143681244 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala: ## @@ -45,7 +46,9 @@ class

[GitHub] [spark] unical1988 commented on pull request #40468: "[SPARK-42838][SQL] changed error class name _LEGACY_ERROR_TEMP_2000"

2023-03-21 Thread via GitHub
unical1988 commented on PR #40468: URL: https://github.com/apache/spark/pull/40468#issuecomment-1478174533 > Found JIRA: [SPARK-42838](https://issues.apache.org/jira/browse/SPARK-42838). > > Can you attach the JIRA number to title? > > e.g.

[GitHub] [spark] pan3793 commented on pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-21 Thread via GitHub
pan3793 commented on PR #40444: URL: https://github.com/apache/spark/pull/40444#issuecomment-1478154866 Thank you, @dongjoon-hyun and @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

  1   2   3   >