[GitHub] [spark] bjornjorgensen commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub
bjornjorgensen commented on PR #40381: URL: https://github.com/apache/spark/pull/40381#issuecomment-1465158915 **The maintainers of the library contend that the application's trust would already have had to be compromised or established and therefore dispute the risk associated with this

[GitHub] [spark] MaxGekk commented on pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-12 Thread via GitHub
MaxGekk commented on PR #39239: URL: https://github.com/apache/spark/pull/39239#issuecomment-1465148669 @HyukjinKwon @cloud-fan A problem of PySpark's timestamp_ltz is it is a local timestamp, and not a physical timestamp that timestamp_ltz is supposed to be. Let's see even Java 7

[GitHub] [spark] bjornjorgensen commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub
bjornjorgensen commented on PR #40381: URL: https://github.com/apache/spark/pull/40381#issuecomment-1465141874 Well, the comment that you are refereeing to, have a link but I cant get in

[GitHub] [spark] yabola commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-12 Thread via GitHub
yabola commented on PR #39950: URL: https://github.com/apache/spark/pull/39950#issuecomment-1465215240 @sunchao Hi~ Could you take a look at this PR? I think it will be useful when there are joined tables and filter conditions are on few tables. -- This is an automated message from the

[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465291603 cc @wangyum and @kenny-ddd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum commented on pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
wangyum commented on PR #40365: URL: https://github.com/apache/spark/pull/40365#issuecomment-1465210746 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] bjornjorgensen commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
bjornjorgensen commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465304288 @dongjoon-hyun Thank you. Yes, this PR looks ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] bjornjorgensen commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
bjornjorgensen commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465316569 Seams to be working now :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] github-actions[bot] commented on pull request #38887: [SPARK-41368][SQL] Reorder the window partition expressions by expression stats

2023-03-12 Thread via GitHub
github-actions[bot] commented on PR #38887: URL: https://github.com/apache/spark/pull/38887#issuecomment-1465343346 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2023-03-12 Thread via GitHub
github-actions[bot] closed pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service URL: https://github.com/apache/spark/pull/38674 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] dongjoon-hyun commented on pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40365: URL: https://github.com/apache/spark/pull/40365#issuecomment-1465290797 Oh, it seems to break Scala style, @wangyum . ``` [error]

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #40365: URL: https://github.com/apache/spark/pull/40365#discussion_r1133318969 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushDownThroughWindow.scala: ## @@ -30,7 +30,7 @@ import

[GitHub] [spark] dongjoon-hyun commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40381: URL: https://github.com/apache/spark/pull/40381#issuecomment-1465293982 BTW, I have two additional questions for the following PR you referred. 1. Do you mean it's the evidence of the previous assessment? > SafeConstructor ignores

[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465298354 Scala linter passed. ![Screenshot 2023-03-12 at 1 59 52 PM](https://user-images.githubusercontent.com/9700541/224573370-cbd15aa7-0cdc-43d6-8d01-9f35d77547f6.png) --

[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465298419 Hi, @bjornjorgensen . Could you review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40365: URL: https://github.com/apache/spark/pull/40365#issuecomment-1465332828 Oh, no problem at all. I just wanted to share my old mistakes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] StevenChenDatabricks opened a new pull request, #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-12 Thread via GitHub
StevenChenDatabricks opened a new pull request, #40385: URL: https://github.com/apache/spark/pull/40385 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] bjornjorgensen commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub
bjornjorgensen commented on PR #40381: URL: https://github.com/apache/spark/pull/40381#issuecomment-1465294879 ok, I didn't know who this user was. I have updated the PR text now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40365: URL: https://github.com/apache/spark/pull/40365#issuecomment-1465294691 To @wangyum , sometimes, the contributor's GitHub Action doesn't work properly. I also made similar mistakes before by forgetting the check. It's our responsibility, the

[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465305631 No, only the PR author can do that because we use the author's GitHub Actions for that PR. > I have seen this one time before. Can you restart tests on the PR that fails?

[GitHub] [spark] dongjoon-hyun opened a new pull request, #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun opened a new pull request, #40386: URL: https://github.com/apache/spark/pull/40386 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465300338 Could you review this in order to recover master branch, @sunchao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] wangyum commented on pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
wangyum commented on PR #40365: URL: https://github.com/apache/spark/pull/40365#issuecomment-1465332554 @dongjoon-hyun Sorry. I will pay attention to check next time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun closed pull request #40381: [SPARK-42761][BUILD][K8S] Upgrade `kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub
dongjoon-hyun closed pull request #40381: [SPARK-42761][BUILD][K8S] Upgrade `kubernetes-client` to 6.5.0 URL: https://github.com/apache/spark/pull/40381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] srowen commented on pull request #40380: [SPARK-42760][DOCS][PYTHON] provide one format for writing to kafka

2023-03-12 Thread via GitHub
srowen commented on PR #40380: URL: https://github.com/apache/spark/pull/40380#issuecomment-1465301926 Wrong JIRA is linked, please adjust -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] bjornjorgensen commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
bjornjorgensen commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465304673 I have seen this one time before. Can you restart tests on the PR that fails? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465304619 Thank you so much, @bjornjorgensen . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40381: URL: https://github.com/apache/spark/pull/40381#issuecomment-1465293060 @bjornjorgensen . Do you mean you can not trust `winniegy`, the fabric8io community member's comment? He close the issue after that comment. ![Screenshot 2023-03-12 at 1 26 33

[GitHub] [spark] bjornjorgensen commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub
bjornjorgensen commented on PR #40381: URL: https://github.com/apache/spark/pull/40381#issuecomment-1465295815 Now that it turns out that the information that I have been given is not correct. This change the whole picture with why we should include this PR. I think we can wait until we

[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465302694 Since this is a comment only change, let me merge this to recover the master branch and PRs. ![Screenshot 2023-03-12 at 2 20 11

[GitHub] [spark] dongjoon-hyun closed pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun closed pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow URL: https://github.com/apache/spark/pull/40386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40386: URL: https://github.com/apache/spark/pull/40386#issuecomment-1465302800 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] wangyum closed pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub
wangyum closed pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow URL: https://github.com/apache/spark/pull/40365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] ivoson commented on pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-12 Thread via GitHub
ivoson commented on PR #40286: URL: https://github.com/apache/spark/pull/40286#issuecomment-1465226363 > Just a doc change I had missed last time around. Rest looks good to me - can you check the proposed change, and reformulate it to something similar ? I will merge it once done Hi

[GitHub] [spark] mridulm commented on pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-12 Thread via GitHub
mridulm commented on PR #40286: URL: https://github.com/apache/spark/pull/40286#issuecomment-1465327881 Can you update to latest @ivoson ? The style failure is not related to your change, but blocks build -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40375: [SPARK-42755][CONNECT] Factor literal value conversion out to `connect-common`

2023-03-12 Thread via GitHub
zhengruifeng commented on code in PR #40375: URL: https://github.com/apache/spark/pull/40375#discussion_r1133354786 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala: ## @@ -107,7 +107,7 @@ object

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub
zhengruifeng commented on code in PR #40376: URL: https://github.com/apache/spark/pull/40376#discussion_r1133369721 ## python/pyspark/sql/connect/expressions.py: ## @@ -308,6 +308,43 @@ def _infer_type(cls, value: Any) -> DataType: def _from_value(cls, value: Any) ->

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub
zhengruifeng commented on code in PR #40376: URL: https://github.com/apache/spark/pull/40376#discussion_r1133369822 ## python/pyspark/sql/connect/expressions.py: ## @@ -308,6 +308,43 @@ def _infer_type(cls, value: Any) -> DataType: def _from_value(cls, value: Any) ->

[GitHub] [spark] ulysses-you opened a new pull request, #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-03-12 Thread via GitHub
ulysses-you opened a new pull request, #40390: URL: https://github.com/apache/spark/pull/40390 ### What changes were proposed in this pull request? This pr enables the `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` by default. ### Why are the changes

[GitHub] [spark] ulysses-you commented on a diff in pull request #40262: [SPARK-42651][SQL] Optimize global sort to driver sort

2023-03-12 Thread via GitHub
ulysses-you commented on code in PR #40262: URL: https://github.com/apache/spark/pull/40262#discussion_r1133461951 ## sql/core/src/main/scala/org/apache/spark/sql/execution/DriverSortExec.scala: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub
zhengruifeng commented on code in PR #40376: URL: https://github.com/apache/spark/pull/40376#discussion_r1133373330 ## python/pyspark/sql/connect/expressions.py: ## @@ -308,6 +308,43 @@ def _infer_type(cls, value: Any) -> DataType: def _from_value(cls, value: Any) ->

[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub
navinvishy commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1465390080 > @navinvishy sorry, it seems late for 3.4.0. would you mind changing the version from `3.4.0` to `3.5.0`? > > I am going to merge this PR to master this week if no more

[GitHub] [spark] xinrong-meng opened a new pull request, #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-12 Thread via GitHub
xinrong-meng opened a new pull request, #40388: URL: https://github.com/apache/spark/pull/40388 ### What changes were proposed in this pull request? Regulate the import path of `pandas_udf`. ### Why are the changes needed? Usability. ### Does this PR introduce _any_

[GitHub] [spark] liang3zy22 commented on a diff in pull request #40347: [SPARK-42711][BUILD]Update usage info for sbt tool

2023-03-12 Thread via GitHub
liang3zy22 commented on code in PR #40347: URL: https://github.com/apache/spark/pull/40347#discussion_r1133395930 ## build/sbt: ## @@ -17,7 +17,7 @@ # limitations under the License. # -SELF=$(cd $(dirname $0) && pwd) +SELF=$(cd "$(dirname "$0")" && pwd) Review Comment:

[GitHub] [spark] liang3zy22 commented on a diff in pull request #40347: [SPARK-42711][BUILD]Update usage info for sbt tool

2023-03-12 Thread via GitHub
liang3zy22 commented on code in PR #40347: URL: https://github.com/apache/spark/pull/40347#discussion_r1133397703 ## build/sbt: ## @@ -62,7 +63,7 @@ Usage: $script_name [options] -sbt-rc use an RC version of sbt -sbt-snapshot use a snapshot

[GitHub] [spark] LuciferYang commented on pull request #40389: [SPARK-42767][CONNECT][TESTS] Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly

2023-03-12 Thread via GitHub
LuciferYang commented on PR #40389: URL: https://github.com/apache/spark/pull/40389#issuecomment-1465482691 Will update pr description later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] LuciferYang opened a new pull request, #40389: [SPARK-42767][CONNECT][TESTS] Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly

2023-03-12 Thread via GitHub
LuciferYang opened a new pull request, #40389: URL: https://github.com/apache/spark/pull/40389 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-12 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1133467299 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] dongjoon-hyun commented on pull request #40387: [SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread via GitHub
dongjoon-hyun commented on PR #40387: URL: https://github.com/apache/spark/pull/40387#issuecomment-1465358231 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] viirya commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-12 Thread via GitHub
viirya commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1133355436 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub
zhengruifeng commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1465385482 @navinvishy sorry, it seems late for 3.4.0. would you mind changing the version from `3.4.0` to `3.5.0`? I am going to merge this PR to master this week if no more

[GitHub] [spark] zhengruifeng commented on pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub
zhengruifeng commented on PR #40376: URL: https://github.com/apache/spark/pull/40376#issuecomment-1465436796 > Do we intentionally ignore `YearMonthIntervalType`? it seems that we have not support `YearMonthIntervalType` in vanilla PySpark, and the Python Client is reusing PySpark's

[GitHub] [spark] cloud-fan commented on pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-12 Thread via GitHub
cloud-fan commented on PR #40359: URL: https://github.com/apache/spark/pull/40359#issuecomment-1465448190 thanks, merged to master! can you open a backport PR for 3.4? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on a diff in pull request #40262: [SPARK-42651][SQL] Optimize global sort to driver sort

2023-03-12 Thread via GitHub
cloud-fan commented on code in PR #40262: URL: https://github.com/apache/spark/pull/40262#discussion_r1133447713 ## sql/core/src/main/scala/org/apache/spark/sql/execution/DriverSortExec.scala: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] ulysses-you commented on pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-03-12 Thread via GitHub
ulysses-you commented on PR #40390: URL: https://github.com/apache/spark/pull/40390#issuecomment-1465534836 thank you @dongjoon-hyun , I'm fine to hold on this until next release. Another thought is I want to make sure all tests can be passed. -- This is an automated message from the

[GitHub] [spark] cloud-fan commented on pull request #40314: [SPARK-42698][CORE] SparkSubmit should also stop SparkContext when exit program in yarn mode and pass exitCode to AM side

2023-03-12 Thread via GitHub
cloud-fan commented on PR #40314: URL: https://github.com/apache/spark/pull/40314#issuecomment-1465541487 @AngersZh can you comment on the original discussion thread and convince related people to add back `sc.stop`? https://github.com/apache/spark/pull/32081#discussion_r663434289 --

[GitHub] [spark] dongjoon-hyun opened a new pull request, #40387: [SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread via GitHub
dongjoon-hyun opened a new pull request, #40387: URL: https://github.com/apache/spark/pull/40387 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] cloud-fan commented on pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-12 Thread via GitHub
cloud-fan commented on PR #39624: URL: https://github.com/apache/spark/pull/39624#issuecomment-1465444732 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40375: [SPARK-42755][CONNECT] Factor literal value conversion out to `connect-common`

2023-03-12 Thread via GitHub
zhengruifeng commented on code in PR #40375: URL: https://github.com/apache/spark/pull/40375#discussion_r1133419950 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala: ## @@ -192,8 +173,7 @@ object

[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub
cloud-fan commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1133450986 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1399,6 +1399,151 @@ case class ArrayContains(left:

[GitHub] [spark] xinrong-meng commented on pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub
xinrong-meng commented on PR #40376: URL: https://github.com/apache/spark/pull/40376#issuecomment-1465422914 Do we intentionally ignore `YearMonthIntervalType`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan closed pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-12 Thread via GitHub
cloud-fan closed pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec URL: https://github.com/apache/spark/pull/39624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-12 Thread via GitHub
beliefer commented on PR #40359: URL: https://github.com/apache/spark/pull/40359#issuecomment-1465460479 > thanks, merged to master! can you open a backport PR for 3.4? Thank you! I will create it. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] zhengruifeng commented on pull request #40372: [SPARK-42752][PYSPARK][SQL] Make PySpark exceptions printable during initialization

2023-03-12 Thread via GitHub
zhengruifeng commented on PR #40372: URL: https://github.com/apache/spark/pull/40372#issuecomment-1465459996 cc @itholic @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on a diff in pull request #40375: [SPARK-42755][CONNECT] Factor literal value conversion out to `connect-common`

2023-03-12 Thread via GitHub
beliefer commented on code in PR #40375: URL: https://github.com/apache/spark/pull/40375#discussion_r1133416489 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala: ## @@ -192,8 +173,7 @@ object

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40387: [SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #40387: URL: https://github.com/apache/spark/pull/40387#discussion_r1133444982 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala: ## @@ -82,7 +82,7 @@ private[spark]

[GitHub] [spark] ulysses-you commented on pull request #39037: [SPARK-41214][SQL] Fix AQE cache does not update plan and metrics

2023-03-12 Thread via GitHub
ulysses-you commented on PR #39037: URL: https://github.com/apache/spark/pull/39037#issuecomment-1465513263 close this in favor of https://github.com/apache/spark/pull/39624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] ulysses-you closed pull request #39037: [SPARK-41214][SQL] Fix AQE cache does not update plan and metrics

2023-03-12 Thread via GitHub
ulysses-you closed pull request #39037: [SPARK-41214][SQL] Fix AQE cache does not update plan and metrics URL: https://github.com/apache/spark/pull/39037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub
cloud-fan commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1133448219 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1399,6 +1399,151 @@ case class ArrayContains(left:

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #40390: URL: https://github.com/apache/spark/pull/40390#discussion_r1133456778 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1493,15 +1493,14 @@ object SQLConf { val

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub
WeichenXu123 commented on code in PR #40376: URL: https://github.com/apache/spark/pull/40376#discussion_r1133366703 ## python/pyspark/sql/connect/expressions.py: ## @@ -308,6 +308,43 @@ def _infer_type(cls, value: Any) -> DataType: def _from_value(cls, value: Any) ->

[GitHub] [spark] viirya commented on a diff in pull request #40387: [SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread via GitHub
viirya commented on code in PR #40387: URL: https://github.com/apache/spark/pull/40387#discussion_r1133366833 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala: ## @@ -82,7 +82,7 @@ private[spark] object

[GitHub] [spark] zhengruifeng closed pull request #40382: [SPARK-42679][CONNECT][PYTHON] createDataFrame doesn't work with non-nullable schema

2023-03-12 Thread via GitHub
zhengruifeng closed pull request #40382: [SPARK-42679][CONNECT][PYTHON] createDataFrame doesn't work with non-nullable schema URL: https://github.com/apache/spark/pull/40382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng commented on pull request #40382: [SPARK-42679][CONNECT][PYTHON] createDataFrame doesn't work with non-nullable schema

2023-03-12 Thread via GitHub
zhengruifeng commented on PR #40382: URL: https://github.com/apache/spark/pull/40382#issuecomment-1465447635 merged to master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan closed pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-12 Thread via GitHub
cloud-fan closed pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect URL: https://github.com/apache/spark/pull/40359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub
cloud-fan commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1133450131 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1399,6 +1399,151 @@ case class ArrayContains(left:

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #40388: URL: https://github.com/apache/spark/pull/40388#discussion_r1133455875 ## python/pyspark/sql/connect/functions.py: ## @@ -2467,7 +2467,7 @@ def udf( def pandas_udf(*args: Any, **kwargs: Any) -> None: Review Comment: +1 for

[GitHub] [spark] zhengruifeng commented on pull request #40367: [SPARK-42747][ML] Fix incorrect internal status of LoR and AFT

2023-03-12 Thread via GitHub
zhengruifeng commented on PR #40367: URL: https://github.com/apache/spark/pull/40367#issuecomment-1465353652 @srowen Thank you for the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40387: [SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread via GitHub
dongjoon-hyun commented on code in PR #40387: URL: https://github.com/apache/spark/pull/40387#discussion_r1133357991 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala: ## @@ -82,7 +82,7 @@ private[spark]

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-12 Thread via GitHub
zhengruifeng commented on code in PR #40388: URL: https://github.com/apache/spark/pull/40388#discussion_r1133413599 ## python/pyspark/sql/connect/functions.py: ## @@ -2467,7 +2467,7 @@ def udf( def pandas_udf(*args: Any, **kwargs: Any) -> None: Review Comment: what

[GitHub] [spark] cloud-fan commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-12 Thread via GitHub
cloud-fan commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1465459756 The auto-generated alias name is fragile and we are trying to improve it at https://github.com/apache/spark/pull/40126 Can you give some examples of how the new update changes

[GitHub] [spark] sunchao commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-12 Thread via GitHub
sunchao commented on PR #39950: URL: https://github.com/apache/spark/pull/39950#issuecomment-1465508828 > Sorry, it might be necessary to read footer twice if having filters. We should read schema in footer meta first to get which filters need to be pushed down. After that we set pushdown

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-12 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1465517104 Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please review this PR or direct it to someone who is aware of this code. -- This is an automated message from the

[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-12 Thread via GitHub
shrprasa commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1465517975 Gentle ping @holdenk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] holdenk commented on a diff in pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-12 Thread via GitHub
holdenk commented on code in PR #40128: URL: https://github.com/apache/spark/pull/40128#discussion_r1133449379 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala: ## @@ -143,6 +144,9 @@ private[spark] class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub
cloud-fan commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1133454113 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1399,6 +1399,151 @@ case class ArrayContains(left:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub
cloud-fan commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1133454186 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1399,6 +1399,151 @@ case class ArrayContains(left: